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THE EFFECT ON RECALL AND RECOGNITION OF THE 
EXAMINATION SET IN CLASSROOM SITUATIONS* 


GEORGE MEYER 
Psychological Laboratory, University of Michigan 


In two recent papers*‘ the author showed that the type of examina- 
tion for which an individual prepared was of fundamental importance 
in determining what the individual learned and remembered under 
certain laboratory conditions. Four equated groups of subjects, 
who prepared for a true-false, multiple-choice, completion, or essay 
test respectively, were asked to study two mimeographed chapters of 
unfamiliar historical material on campaigns during the Civil War. 
After three two-hour supervised study periods the four groups were 
tested on the day following the last study period by all four types 
of tests. The tests were repeated after an interval of five weeks. 
The results showed that the groups with the recall examination sets 
were superior to the groups with the recognition examination sets 
except on the recognition tests given on the day following the last 
study period. The group with the completion type of recall examina- 
tion set was found to be superior to the group with the essay type 
on both the first and the second completion tests. The group with 
the essay type of recall examination set was found to be superior to 
that with the completion type on the two essay tests. No differences 
were found between the two groups with recognition examination 
sets on any of the various types of tests. 

The differences found were explained with reference to the effect 
of the examination set on the number and kinds of study methods 
used during the learning. The individuals in the essay group reported 
that they attempted to organize the material. They used predomi- 
nantly such methods as the making of summaries and maps. Fewer 
individuals in this group used mere passive reading or only one of 





* This study was made possible by a grant from the Committee on the Faculty 
Research Fund of the University of Michigan. 
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the active study methods than in the other three groups. The indi- 
viduals in the other groups, in general, reported that they attempted to 
learn the details only. In these groups such methods as underlining, 
making out of practice test questions, and taking of random notes 
were used predominantly. The individuals in the completion group, 
however, reported that they expended more effort in learning the details 
than if they had been preparing for a recognition test. 

As the study just discussed was carried out under laboratory 
conditions the question arises as to what differences in results would 
be found under conditions which more nearly approach the classroom 
situation. The present investigation attempts to answer that question. 


EXPERIMENT 


The Subjects-—One hundred twenty-four students from the second 
semester of the year course in elementary psychology at the University 
of Michigan during the school year 1934-1935 served as subjects. 
The subjects were divided into four groups: True-false (7'-F), comple- 
tion, multiple-choice (M-C), and essay. These groups were matched 
rather than equated by the mean-standard deviation method which 
had been used in the previous study. 

The bases on which the matching was done were as follows: Year 
in college, age, sex, achievement in the course up to the time of the 
experiment, and the time of meeting of the course recitation section. 

The Learning Material.—The chapter on memory from Pillsbury’s 
text’ formed the basis of the learning material. This was supple- 
mented in the lectures and recitation sections. All of the subjects 
heard the same three course lectures on the subject. They also heard 
approximately the same discussions in the two recitation sections which 
occurred during the course of the.experiment as the writer took over 
all of the sections in the course from which the subjects came. 

The Tests——Of the four types of tests (essay, completion, true- 
false, and multiple-choice) the essay consisted of six questions on 
major topics covered in the learning material. The completion, 
true-false, and multiple-choice tests each contained one hundred items 
which covered the same material. These questions were drawn 
about equally from each of the major topics tested by the six essay 
questions. The position of the items in the several tests was deter- 
mined by chance so that any influence of position would be eliminated. 

The test items were validated by the method of judgments. The 
author and an assistant in the course drew up lists of what they 
believed to be the important facts covered by the six essay topics. 
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Those facts which seemed most adequate to both individuals were then 
selected and cast in the various question forms. A final judgment 
on the items was made by the lecturer in the course. 

The reliability or consistency coefficients of the tests are given 
in Table I. These coefficients were found by correlating the scores 
on the first and second testings. Two coefficients, one in which the 
scores were uncorrected and the other in which the scores were cor- 
rected* for chance, were found for both the multiple-choice and true- 
false tests. Three coefficients were determined for the essay test as 
follows: By correlating the scores on the same points tested on the 
other examinations (necessary points in the table), by correlating the 
scores on facts other than those tested by the other examinations 
(other items in the table), and by correlating the ratings on the organi- 
zation of the material. 


TABLE I.—CoRRELATIONS BETWEEN THE FIRST AND SECOND TESTS FOR THE GROUP 
oF ONE HUNDRED TWENTY-FOUR SUBJECTS 


abn oa ar ala ace agian de thi ani eee a ean ae .69 + .03 
FREES i preeeg ses Sere ee wee ee re .69 + .03 
A 6k kee Rian) £6046 4R been dee tena .70 + .03 
id aioe ated bth al ous a ha hae oe eee ee ae .69 + .03 
rrr .74 + .03 
ES RE a a ee ee a oe .67 + .03 
on pee cede e ed eens eee veteaees .69 + .03 
clo iiikaactiemaengatiadeteeceaeweses .82 + .02 


Further evidence as to the reliability of the three types of objec- 
tive tests was found by using the method of chance halves and correct- 
ing the coefficients so obtained by using the Spearman prophecy 
formula.j For the multiple-choice, true-false, and completion tests 
the corrected coefficients were respectively: .77, .70, and .84. 

The Experimental Procedure.—Along with the lectures and recita- 
tions the subjects were given three supervised study periods which 
were held from 3:30 to 5:30 o’clock in the afternoons or from 7:30 to 
9:30 in the evenings. At the first of these the subject was given 
mimeographed sheets of directions on which there wasa number. The 
directions were as follows: 


The experiment which you are going to carry out is one whose purpose is 
to try to discover whether it is better to study for an essay, true-false, com- 
R-W. 

n—1l 
nr =} 








*S = 





T Tas = 


i+(n-j)r 
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pletion, or multiple-choice test. Examples of the various types of questions 
are given below. These questions are on the material on pages 482-483 in 
your text. 

Essay Question. 

1. Discuss as fully as possible the three methods which have been used in 
studying rote memory. What does each of these methods measure? 

True-false Questions. 

1. Ebbinghaus used the prompting method in his studies of rote memory. 

2. The “‘savings method” measures potential memory. 

Complete Questions. 

1. Ebbinghaus used the method in his studies of rote memory. 

2. The ‘‘savings method”’ measures 

Multiple-choice Questions. 

1. Ebbinghaus used the (savings, prompting, paired associates) method 
in his studies of rote memory. 

2. The ‘‘savings method”’ measures (actual recall, potential memory, 
recognition). 

You are to prepare yourself for a * test on the subject of 
memory. To prepare yourself for this test you are to study Chapter XVI in 
your textbook, your lecture notes, and your recitation notes. 

You will have three study periods in which to prepare for this examination. 
These study periods will come on April 17th, 18th, and 24th. On April 25th 
you will be given the first test on the material you have studied and on May 
23rd you will be given a second test. 

In preparing for the type of test assigned to you, use the methods which 
you ordinarily use in preparing for that kind of test. Any notes that you take 
during the study periods are to be turned in on April 24th at the close of the 
period. At the same time allow the experimenter or assistant to examine 
your textbook and your lecture and class notes. The notes which you turn 
in will be returned to you on May 23rd. You are not to study any of this 
material except during the supervised study periods. 

Each time you come to the experiment you must report your number when 
arriving and when leaving to the experimenter or assistant. 

At the close of each study period hand in a brief statement (preferably in 
outline form) concerning how you spent your time. At the end of the third 
study period besides the foregoing report hand in answers to the following 
questions. 

Did you study differently for the type of examination assigned to you from 
what you would have if you had been assigned one of the other types of 
examination? If so, what different methods did you use? Be sure you indi- 
cate the methods you used for the assigned type and the methods you would 
have used if you had been studying for each of the other types. 














* In this space was placed the name of the type of examination for which the 
individual was to prepare. 
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Did you feel, before the three supervised study periods were over that 
you had studied the material as thoroughly as you would have studied it 
under ordinary circumstances if preparing for the type of examination 
assigned to you? If so, indicate in your report the time you would have 
stopped studying and would have been ready to take the test. If, on the 
other hand, you feel that not enough time was allowed during the study 
periods to prepare for the type of examination assigned to you indicate how 
much more time you would have spent if you had been studying as you usually 
do. Compare this with the amount you think you would have used had you 
been studying for each of the other types of examination. 

Would you, if you had been studying as you ordinarily do, have distributed 
the study periods differently for the type of examination assigned to you? 
If so, indicate how you would have distributed them. Compare this distribu- 
tion with the distribution you would have used had you been studying for 
each of the other types of examination. 

On the foregoing reports and also on the notes which you hand in, place 
only your number. Until all these reports are in you will be given no credit 
for this work. 

If you have any questions with regard to the procedure, ask the experi- 
menter about them before you start studying the material. During the 
course of this experiment do not discuss it with anyone else. 

Above all keep in mind the fact that this is an experiment, the results of 
which are of value only if you follow the directions given to you. 


The testing procedure was the same for both the first and second 
testings. The test periods were given at the same times of day as 
the learning periods. The subjects, however, were allowed three hours 
to complete the tests, one hour and a half for the essay test and the 
remainder for the three objective tests. 

The test order for a particular subject was determined by chance. 
Each of the first twenty-four subjects in each group had one of the 
twenty-four possible orders. Then seven more orders were selected 
by chance from the twenty-four and given to the remaining individuals 
in each group so that in each group a given test order appeared the 
same number of times. This procedure was followed in order to 
equalize the practice effects of the first tests on the later ones. 

Scoring of the Tests——The true-false and multiple-choice tests 
were corrected by assistants using prepared keys. The completion 
tests were scored by the writer, and the essay tests by both the writer 
and an assistant in the course. The three sets of scorings (necessary 
points, other items, rating for organization) on the essay tests were 
made absolutely independently by each of the scorers, no marks being 
placed on the papers by them. For each of the points tested by the 
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objective tests stated correctly one point was given under “necessary 
points.” For any point not tested by the objective examinations 
but stated correctly a point was given under “other items.” Rating 
of the organization of the answers was made on a ten point basis, a 
zero being given to a paper where facts were stated haphazardly, and 
ten being given to a paper which seemed to be as well organized as 
the learning material. 

The results on the essay tests in the following section are based 
on the averages of the two scorings. That these scores are reliable 
is indicated by the fact that the correlations between the scores given 
by the two scorers ranged between .83+ .02 and .91+ .01. 

Test Results—In Table II are given the experimental data on the 
true-false tests. The means and standard deviations of the distribu- 
tions and the standard deviations of the means are tabulated for the 
true-false, multiple-choice, completion, and essay examination set 
groups on the true-false tests both when the scores were uncorrected 
(U) and corrected (C) for chance. 


TaBLeE I].—ExXpERIMENTAL DATA FOR THE VARIOUS GROUPS ON THE TRUE-FALSB 











TEstTs 
First test Second test 
Group N 

Mean SD SDwy | Mean SD SDy 
Eee 31 | 67.81 | 6.58 | 1.18 | 64.07] 6.34] 1.14 
ns a a ie kwh eee 31 | 38.44 | 13.53 | 2.43 | 32.31 | 10.56 | 1.90 
gg, ee 31 | 66.26 | 5.14 .92 | 61.74 | 8.64 | 1.55 
Ns ccxnsé¢aeweceseas 31 | 35.82 | 9.63 | 1.73 | 30.60 | 12.69 | 2.28 
Completion (U)........... 31 | 71.61 | 8.08 | 1.45 | 63.81 8.64 | 1.55 
Completion (C)........... 31 | 45.69 | 13.77 | 2.47 | 35.15 | 11.67 | 2.10 
BMPR, ci cccsoncucones 31 | 71.23 | 8.16 | 1.47 | 67.94] 9.20) 1.65 
Ss 31 | 45.79 | 13.38 | 2.40 | 39.89 | 16.02 | 2.88 


























Critical ratios* computed from the data in Table II are given 
in Table III. In Table III the items in the columns are compared 
with the items in the rows so that the critical ratio, 1.04, in the first 
column and third row indicates that the mean of the true-false examina- 
tion set group is larger than that of the multiple-choice examination 
set group on the first true-false test when the scores are uncorrected 
for chance. Had the difference in means been in favor of the multiple- 





* These critical ratios are D/s D. The criterion for a reliable difference used 
in this study is D/o D = 3. 
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choice group, a minus sign would have been placed in front of the 
ratio so that the table could be more readily understood. 


TaBLe IIJ.—CompaRING THE VaRIOUsS GROUPS ON THE TRUE-FALSE TESTS IN 
TeRMS OF THEIR CRITICAL RATIOS 





























First test Second test 
— TF | mc [OmPle| op | uc |Comple 
tion tion 

Rc octane ccans 
ee kb a cebn 
EE 6 ce ekks wees i) RS eer 1.21 
LL ee ee eee .58 
Completion (U)....... —2.04| -—3.11 |] ..... 13 — .94 
Completion (C)........ —2.10 | —3.27| ..... —1.00 | —1.47 
ere —1.82 | —2.87 19} —1.93 | —2.74} —1.83 
Se —2.15 | —3.37 —.03 | —2.20 | —2.53 | —1.33 





The results indicated in Table III are as follows: 

1. On the true-false test given on the day following the last learn- 
ing period the only reliable differences are found between the multiple- 
choice and both the completion and essay examination set groups. 
The latter groups are both superior to the multiple-choice group. 

2. Both the completion and essay groups are superior to the true- 
false examination set group on the first true-false test. The differ- 
ences, although approaching statistical significance, are not reliable. 

3. The true-false group shows a slight superiority over the multiple- 
choice group on the first true-false test especially when the uncorrected 
scores are used. This difference, however, is not reliable. 

4. On the true-false test given four weeks after the last learning 
period no reliable differences are found among any of the groups. 
The same general tendencies found above are indicated with the addi- 
tion of a slight superiority of the essay as compared with the comple- 
tion examination set group. 

The experimental data on the multiple-choice tests are given in 
Table IV. The critical ratios based on these data are to be found in 
Table V. From these ratios the following conclusions seem warranted. 

1. Except for the superiority of the essay over the multiple-choice 
examination set group when the scores are corrected for chance no 
reliable differences are found on the first multiple-choice test. 

2. Both recall examination set groups, however, show superiority 
to both recognition groups on the first multiple-choice test. The 
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differences approach statistical significance in all cases, the chances 
being better than ninety-eight in one hundred that true differences are 


present. 


TasLe I[V.—EXPEeRIMENTAL DaTA FOR THE VARIOUS GROUPS ON THE 


MULTIPLE-CHOICE TESTS 











First test Second test 
Group N 

Mean SD SDa | Mean SD SDu 
PPM Encccsceeaddcaeede 31 | 62.82 | 8.31 | 1.49 | 59.15 | 7.741] 1.39 
7) 31 | 45.31 | 11.31 | 2.03 | 41.44] 7.56] 1.36 
eer: 31 | 63.79 | 6.66 | 1.20 | 60.50] 6.93 | 1.24 
ais iy ooh aiewi na 31 | 46.86 | 9.90 | 1.78 | 43.37 | 10.14 | 1.82 
Completion (U)........... 31 | 67.76 | 8.37 | 1.50 | 61.08 | 6.63 | 1.19 
Completion (C)........... 31 | 52.66 | 11.43 | 2.05 | 45.11 | 8.91 | 1.60 
ETDs cccsecvennd oh 31 | 68.83 | 9.30 | 1.67 | 66.31 | 9.30] 1.67 
se 31 | 54.40 | 12.24 | 2.20 | 51.21 | 12.45 | 2.24 


























TaBLE V.—ComMPARING THE VARIOUS GROUPS ON THE MULTIPLE-CHOICE TESTS 
IN Terms OF THEIR CRITICAL RATIOS 











First test Second test 
cmp Compl Compl 
M-c | T-F |“P’| wc | TF [Pe 
tion tion 
55 ikea oin-acet 
| ee 
eee es eee — .73 
eer | kt ere Beerrr — .85 
Completion (U)....... —2.34 | —2.07] ..... —1.06 — .34 
Completion (C)........ —2.54 | —2.13] ..... —1.75 — .72 
Se —2.68 | —2.45 —.48 | —3.30 | —2.79 | —2.55 
Mm (0). 6c cccsccies —3.04 | —2.67 —.58 | -—3.73 | —2.71 | —2.22 























3. On the second multiple-choice test the essay examination set 


group is superior to the other three groups. 


A reliable difference 


is found between the essay and multiple-choice examination set 
groups both when the scores are uncorrected and corrected for chance. 
The other differences approach reliability. 

4. The completion group is superior to the multiple-choice examina- 
tion set group. The difference, however, is not statistically significant. 
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The experimental data on the completion tests are given in Table 
VI. The critical ratios computed from these data are shown in Table 
VII. These ratios indicate that: 

1. On the first completion test the essay examination set group 
is superior to the other three groups. The only reliable difference 
is found between the essay and multiple-choice examination set 
groups. 


TaBLE VI.—EXPERIMENTAL DaTA FOR THE VARIOUS GROUPS ON THE COMPLETION 











TEstTs 
First test Second test 
Group N 
Mean SD SDy | Mean SD SDy 
CIEE. os vencvscvence 31 | 57.40 | 10.29 | 1.85 | 50.53 | 10.08 | 1.81 
da on wine bamianatl 31 | 52.57 | 10.26 | 1.84 | 45.98 | 10.53 | 1.89 
at bd nid nd uk os deneeu 31 | 50.92 | 10.38 | 1.86 | 47.15 | 9.90] 1.78 
li din is be ip ahh oc oe 31 | 60.40 | 12.48 | 2.24 | 55.76 | 10.86 | 1.95 


























2. The completion set group is superior to both the multiple- 
choice and true-false groups on the first completion test. In both 
cases the differences approach statistical significance. 

3. The same findings as above hold for the second completion test 
except that here the difference between the essay and true-false 
examination set groups is reliable. 


TaBLEeE VII.—ComPaRING THE VARIOUS GROUPS ON THE COMPLETION TESTS IN 
Terms OF THEIR CRITICAL RATIOS 











First test Second test 
comp Comple- Comple- 
tion sal MC tion “" ae 
Completion........... 
EE ee. eee Sie scene 
CE ieke didi ie 6.06 6 ek be 2.48 . fer 1.33 — .45 
re —1.03 | —2.70 | —3.26 | —1.97 | —3.51 | —3.26 























In Table VIII are the experimental data on the essay tests for 
necessary points (N), other items (J), and organization (O). Table 
IX gives the critical ratios based on these data. 

The results in Table [X show that: 

1. The essay examination set group is superior to the other three 
groups on the first essay test when comparisons are made on necessary 
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points, other items, and organization. The only statistically signifi- 
cant differences, however, are found between the essay and multiple- 


TasBLeE VIII.—ExprerRIMENTAL DATA FOR THE VARIOUS GROUPS ON THE ESSAY 














TEstTs 
First test Second test 
Group N 

Mean SD SDw | Mean | SD | SDxw 
Rs tend de uae we 31 | 26.36 | 8.46 | 1.52 | 20.61 | 8.30 | 1.49 
ee 31 | 29.32 | 11.54 | 2.07 | 21.90 | 7.64 | 1.37 
SPREE 31 6.66 | 1.30 .23 | 4.75 | 1.24 .22 
Completion (N)............ 31 | 23.39 | 7.82 | 1.40 | 17.13 |] 7.38 | 1.33 
Completion (J)............ 31 | 26.29 | 7.96 | 1.43 | 17.39 | 5.40 .97 
Completion (O)............ 31 | 5.93] 1.07 .19 | 3.78 | 1.07 .19 
ae whacee 31 | 20.29 | 7.06 | 1.27 | 13.97 | 6.00 | 1.08 
Se ee re 31 | 25.07 | 8.14] 1.46 | 16.55 | 6.14] 1.10 
ee eee 31 5.86 | 1.22 .22| 3.57 | 1.18 21 
ee re re 31 | 17.77 | 6.92 | 1.24 | 14.81 |} 6.60 | 1.19 
ies eae adage cud 31 | 20.03 | 7.66 | 1.38 | 15.19 | 5.88 | 1.06 
eee 31 5.06 | 1.35 .24) 3.41] 1.11 .20 























TaBLE [X.—CoMPARING THE VARIOUS GROUPS ON THE Essay TEsTs IN TERMS 
or THEIR CRITICAL Ratios 











First test Second test 
Group Compl Comple- 
Essay mpie-| 7-F | Essay “el T-F 

tion tion 
Essay (N)............ 
RCE 
ae 
Completion (N)....... 1.43 1.74 
Completion (J)........ 1.21 2.68 
Completion (O)........ 2.45 er 3.33 
EPS eer 3.06 1.64 3.61 1.85 
i are 1.68 .60 3.04 .57 
errr 2.50 .22 ne 3.87 .73 
ere 4.38 3.00 1.14 3.04 1.30 — .52 
og: 3.73 3.14 2.50 3.88 1.52 .88 
Rr 4.79 | 2.83 2.46 4.51 1.34 .56 























choice groups on all three comparisons, and between the essay and 
true-false groups on necessary points. 

2. The completion examination set group, in turn, is superior to 
both recognition set groups on the first essay test, although here the 
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only reliable differences are found between the completion and multiple- 
choice groups on necessary points and other items. 

3. The true-false examination set group is superior to the multiple- 
choice group on the first essay test with respect to all three compari- 
sons. Although none of the three differences are reliable, two (other 
items and organization) approach statistical significance, the chances 
being better than ninety-nine in one hundred that true differences 
are present in both cases. 

4. On the second essay test the essay examination set group is 
again superior to the other three groups. On this test all of the differ- 
ences but those between the essay group and the completion group 
on necessary points and other items are reliable. The two differences 
which are not statistically significant have a greater degree of relia- 
bility than on the first test. 

5. The completion examination set group is superior to the two 
recognition examination set groups on the second essay test. None 
of these differences is statistically significant. 

Results on Methods of Study.—An examination of the subjects’ 
textbooks and notes was made to determine in so far as possible what 
the subjects actually did during the learning periods. It was found 
that the methods used could be classified into five groups: (1) Under- 
lining, checking, or numbering of words, phrases, and sentences in the 
textbook, in lecture and recitation notes, and in the notes which were 
taken during the study periods; (2) the listing of names and numbers; 
(3) the taking of random notes, 7.e., notes which were more than listing 
but which had no organization; (4) the making of summaries in para- 
graph form; and (5) the framing of practice test questions. 


TABLE X.—COMPARING THE VARIOUS GROUPS AS TO THE NUMBER OF METHODS OF 
Strupy Usrp 











Critical ratios 
Group N | Mean} SD | SDxu 
Essay Com- | pp 
pletion 
I ccc cccdcccccsccccvccl MEE S201.) 
Completion................ 31 | 2.23} 1.24] .22| —.11 
DsGbcshthebesvdawenne’ 31 | 1.97 | 1.06} .19 .82 .89 
EE eee errr ee 31 | 2.19| .82 15 .00 12 — .93 


























In Table X are given the data on the average number of the five 
methods of study used by each group and the critical ratios computed 
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from these data. Although no statistically significant differences 
are to be found, the same consistent trend indicated in the author’s 
previous study (4) is shown here, 7.e., the true-false group does not 
use so many methods as do the other groups. 

In Tables XI through XV the various groups are compared with 
respect to the differences in percentages of individuals who used each 
of the five study methods. 


TaBLeE XI.—CoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE USING 











UNDERLINING 
‘tical rat 
Number | Per cent SD vomecatasienanen 
Group N* using using a a Bnei 

- }underlining| underlining P Essay a c! T-F 
ee 27 21 78 .074 
Completion....... 28 23 82 .069 |— .40 
_ Pree 27 24 89 .056 |—1.18) —.79 
ere 30 26 87 .060 |— .95) —.55 .24 


























* N in this and the following four tables is the number of individuals who used 
one or more of the five methods. 


TaBLE XII.—CoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE USING 











LIsTING 
‘tical rati 
Number | Per cent SD sonnet 
Group N using using o- ameil Compl 
listing | listing |? Essay | ~OP'*| TF 
tion 
re 27 21 78 .074 
Completion....... 28 22 79 .073 —.10 
A 27 21 78 .074 .00 .10 
SR 30 26 87 .060 —.95) —.84] —.95 


























Although few reliable differences are found in Tables XI through 
XV, certain trends, most of which are consistent with those found in 
@ previous study,‘ are indicated. These are: 

1. Underlining as a method of study is used more by those groups 
with a recognition examination set than by those with a recall examina- 
None of these differences is statistically significant. 


tion set. 
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2. The listing of names and numbers as a method of study is used 
more by the multiple-choice examination set group than by any other 
group. Again the differences are not reliable. 


Tas_Le XIII.—CompaRING THE VARIOUS GROUPS AS TO THE PERCENTAGE USING 
Ranvom Nores 











Number | Per cent Critical ratios 
Group N using using SD, 
random random | per cent Comple- T-F 
notes notes Essay tion ” 

ee inal kG 6 oe 27 6 22 .074 
Completion....... 28 5 18 .068 .40 
Se 27 8 30 .082 — .73} —1.09 
| ne 30 3 10 .054 1.30 .92 | 2.04 


























TaBLE XIV.—OoMPARING THE VARIOUS GROUPS AS TO THE PERCENTAGE USING 











SUMMARIES 
7. ” Critical ratios 
Group N using using ve anil Cuma 
summaries | summaries | ? Essay on C| 7 Pp 

NS ti oe iil 27 20 74 .079 ‘ 
Completion....... 28 16 57 .089 | 1.43 
ee er ie as ae On 27 5 19 .070 | 5.18 | 3.36 
DE Aeadéasnenea 30 11 37 .088 | 3.13 1.60 |-—1.61l 


























TABLE. X V.—CoMPARING THE VaRIOUS GROUPS AS TO THE PERCENTAGE USING 











QUESTIONS 
‘tical rati 
— 7 i Critical ratios 
Group N using using ‘ paar Compl 

questions | questions ” Essay rr “| 7-F 
Sere fC 0 0 .000 
Compietion.....:. 28 3 11 .056 |—1.96 
ee ee 27 3 11 .056 |—1.96 .00 
Pcctsdssdeaena 30 0 0 .000 .00} 1.96 | 1.96 


























3. The taking of random notes is used less by the multiple-choice 
examination set group than by any other group, whereas the true-false 
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group uses this method more than any of the other groups. None of 
these differences is statistically significant. 

4. The making of summaries as a method of study is used more by 
the recall examination set groups than by the recognition groups. 
The differences between the essay and both the true-false and multiple- 
choice examination set groups and between the completion and true- 
false examination set groups are reliable. In turn, the essay group 
uses this method more than the completion group, and the multiple- 
choice group uses it more than does the true-false group. These 
latter differences, however, are not reliable. 

5. The completion and true-false examination set groups use the 
making out of practice test questions as a method of study more than 
do the essay and multiple-choice examination set groups. These 
differences are not reliable. 

The analysis made of the subjects’ answers to the questions on 
the manner in which they prepared for the various types of test shows: 

1. Of the one hundred twenty-four subjects ninety-five, or seventy- 
seven per cent (seventy-one per cent of the essay group, eighty-one 
per cent of the multiple-choice group, and seventy-one per cent of 
the completion and true-false groups) hold that they prepared for the 
type of examination assigned to them differently from what they 
would have for other types. The chief difference in preparation is 
between the essay and the objective test groups. Of the ninety-five 
subjects who elaim that they study differently, eighty-three, or 
eighty-seven per cent, hold that for any type of objective test only the 
details need be emphasized in studying; whereas for an essay test, in 
addition to learning the details, the material must be organized. 

2. Of the ninety-five subjects who say that they study differently 
for the several types of examination, forty-six, or forty-eight per 
cent, hold that they ordinarily spend more time in preparing for an 
essay test than for the objective tests; sixty-six, or thirty-nine per cent, 
believe that they spend an equal amount of time in preparing for both 
old and new types of test; and twelve, or thirteen per cent, say that 
they spend more time in preparing for the new types of test. 

3. Twenty-four, or twenty-five per cent, of the ninety-five subjects 
who say that they study differently for the various kinds of tests 
maintain that they ordinarily spend more time and put in more effort 
in preparing for a completion test than for the other two types of 
objective test. 

4. Eight, or eight per cent, of the ninety-five subjects who hold 
that they study differently for the several kinds of tests believe that 


\ 
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they spend more time in studying for a true-false than for a multiple- 
choice test. 

5. As far as the distribution of practice is concerned, no differ- 
ences were indicated by the subjects’ reports. The subjects were 
unanimous in believing that the type of examination being prepared 
for would make no difference in the time chosen to do their studying. 

Discussion.—The present study was carried out under conditions 
which approached the actual classroom situation. The learning 
material was part of the course work, and was provided by a textbook, 
lectures, and discussions in recitation sections. It is to be assumed 
that, since the students knew they would be held responsible for this 
material on the final examination in the course, they were as highly 
motivated as in their ordinary work. What the effect of the three 
supervised study periods was cannot be gauged. They, of course, 
made the learning situation somewhat artificial. However, these 
supervised study periods were believed to be necessary in the present 
investigation in order that the results could be evaluated with reference 
to known variables. It was felt that, if the subjects were allowed 
to study for the six hours wherever and whenever they wished, any 
attempt at an interpretation of the results would have been futile. 
Besides, comparisons between the results of the present study and the 
writer’s previous investigation would have been impossible since the 
learning conditions would not have been comparable. 

The test results of the present study clearly indicate that the recall 
examination set groups do better on all tests than the recognition 
set groups. Many of the differences found were not statistically 
significant. However, the fact that these differences appear con- 
sistently between these groups in this study, together with the fact 
that most of the differences are consistent with those found in the 
earlier study, seems to indicate that they are real differences. 

Not only is the essay examination set group superior to the recogni- 
tion groups on all tests but it is also superior to the completion group 
on all but one of the tests. The differences between these two groups 
are not very reliable on the first tests, but on the second tests all 
of the differences at least approach statistical significance. This 
increase in reliability of the differences on the second test is found not 
only when the essay and completion examination set groups are com- 
pared, but also is found generally when the essay group is compared 
with the recognition groups. 

Although there is little evidence of differences between the true- 
false and multiple-choice examination set groups, there is a very 
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slight indication of a superiority of the true-false group when the 
data in this and in the earlier investigation are both considered. 

Considered in the light of the methods of study used by the various 
groups, whether it be from the subjects’ own reports or from the 
analysis of what the subjects actually did during the learning periods, 
it seems that the differences in test results are dependent upon the 
differences in these methods. ‘The outstanding and consistent differ- 
ences in methods of study found both in this and the earlier study 
seem to lie in the combination of methods which are used in preparing 
for the various types of examination. For the recall examinations, 
particularly for the essay type, some method, such as the making of a 
summary, which will aid in the organization of the material is used 
along with methods which are suitable to learning the detail. On 
the other hand, for the recognition tests, methods which will aid in 
the assimilation of detail, such as underlining, listing, and taking 
random notes, are used predominantly, with little attention being 
paid to organization. That it is the types of method, not merely 
the number of methods used, is indicated in the present study where 
the differences in the number of methods used by the various groups 
was practically negligible. 

A comparison of the test results of this and the earlier investiga- 
tion shows three major differences. In the earlier investigation the 
recall examination set groups were not. superior to the recognition 
groups on the first recognition tests, whereas in the present study 
the reverse is true. Secondly, in the earlier investigation the comple- 
tion examination set froup Was superior to the essay group on the com- 
pletion tests whereas in the present study the latter is superior. 
Thirdly, the rate of forgetting was greater for the recognition than 
for the recall examination set groups on the recognition tests in the 
earlier study. In the present study such a difference is not 
found between the recognition and completion examination set 
groups. 

The following explanation is suggested to account for the first 
two differences. The fact that the present investigation was carried 
out under conditions more nearly approaching those of the classroom 
may be the factor responsible. The writer believes that the major 
difference between the two experimental situations lies in the learning 
material itself. In the earlier investigation the learning material 
was less meaningful to the subjects in that none of them had had 
previous experience with the particular type of material. On the 
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other hand, in the present study the material was an integral part of 
the course work and as such the subjects had a more suitable back- 
ground for understanding it. They, therefore, could more readily 
organize it. Since the attempt at organization is the essential differ- 
ence in the methods of study not only between the recognition and 
recall examination set groups but also between the essay and comple- 
tion set groups, it is possible that the differences in favor of the recall 
groups as compared with the recognition groups on the first recognition 
tests, and the differences in favor of the essay group as compared with 
the completion group on the completion tests, are all brought about 
by the fact that the material can be more readily organized. Second- 
arily, the lectures and recitation sections might help reinforce this 
tendency toward organization. 

A very tentative suggestion is offered to explain the third differ- 
ence. English, Welborn and Killian? have found recently that reten- 
tion varies with the manner of statement of test items. When on 
true-false tests verbatim and paraphrased or verbatim and summary 
statements from the learning materials were given to test retention 
it was found that there was less loss with paraphrase and with summary 
than with verbatim items. Since in the earlier study verbatim and 
paraphrased items were used predominantly and in the present study 
the items were mainly of the paraphrase and summary types, it is 
possible that the differences in the rate of forgetting shown by these 
two studies may be the result of this. Such an explanation would be 
true only if verbatim, and paraphrase and summary items act differ- 
entially with respect to the type of examination set with which the 
individual has studied, a matter which has yet to be experimentally 
determined. 

Provided that the first two differences between the results in this 
and the earlier investigation are due to the causes suggested, then the 
case which the author has previously presented against the objective 
tests and in favor of the essay test is further strengthened. To achieve 
the greatest economy in learning students should prepare for all types 
of examinations with organization of the materials in mind. Objec- 
tive test examination sets, in general, seem to be conducive merely 
to the learning or formation of rote responses. The essay type of 
examination set brings the organization factor into play more promi- 
nently so that not only is more material learned but what is learned is 
better organized. For these reasons it is suggested that the objective 
types of test items, particularly the recognition varieties, should never 
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be used exclusively on a given examination unless the students have 
no intimation of the type of test to expect. 

The writer does not agree with Davis and Moore’s recent evaluation 
of measures of retention.! They say: 


When it is desired to develop attitudes, concepts, and problem-solving 
ability, the recognition method is most appropriate. In the case of recall 
the pupil is expected to search out and revive, either spontaneously or with 
the aid of cues, the previously learned material and the emphasis is 
clearly upon memorization. In recognition the pupil is provided with sug- 
gested answers, some of which are correct, and the stress is upon judgment and 
reasoning rather than upon the reproduction of ideas and facts. With renewed 
emphasis upon problem solving it is evident that recognition as a measure of 
retention will become a fruitful method for estimating teaching efficiency in 
the future. 


The investigations which the writer has reported seem to indicate 
that the reverse is true. The evidence from these studies shows that 
with recognition tests the students’ emphasis in studying is on memoriz- 
ing detail, whereas with the recall test the additional factor of organiza- 
tion is brought into play. The development of attitudes, concepts, 
and problem-solving ability should be brought about more readily 
by studying for the recall types of test since organization is, no doubt, 
a factor in such development. There is no evidence as yet available 
which indicates that either recall tests or recognition tests cannot test 
these capacities after they have developed. Whether or not they are 
tested seems to depend primarily on the examiner’s capability in 
making out examination questions. 


SUMMARY 


Under conditions which approached those of the classroom situa- 
tion four groups of students studied a chapter of material on memory. 
Each group studied with a different type of examination (essay, com- 
pletion, multiple-choice, or true-false) in mind. On the day following 
and four weeks after the last learning period each group was given 
all four types of tests. Under these conditions the following conclu- 
sions are indicated. 

1. The examination set of the individual during learning is of 
fundamental importance to the economy of learning. This is indi- 
cated by the following results. 








Effect of Recall and Recognition 99 


(a) On the first and second recognition tests (both multiple-choice 
and true-false) the recall examination set groups are in general superior 
to the recognition examination set groups. 

(b) On the first and second recall tests (both completion and essay) 
the recall examination set groups are superior to the recognition 
examination set groups. 

(c) On the first and second recall tests (both completion and essay) 
the essay examination set group is superior to the completion examina- 
tion set group. 

(d) Very slight differences are found between the recognition set 
groups on any of the tests. Most of the existing differences, however, 
favor the true-false examination set group. 

2. The methods of study which are used during learning seem to 
be determined by the examination set. It is probable that the differ- 
ences in methods of study account for the differences in test results. 
The chief difference lies in what may be called the organization factor. 
The essay group uses more than any other group a method, the making 
of summaries, in which organization is inherent. In addition to 
this method the essay examination set group uses the methods which 
are used predominantly by the objective examination set groups. The 
methods such as underlining, taking of random notes, and listing of 
names and numbers, which are used in greater measure by the recogni- 
tion examination set groups, seem suitable to learning rote responses 
only. 

3. In the light of these results and those of an earlier investigation 
it is suggested that, for the most economical learning, individuals 
should study preferably with an essay examination set. This means 
that the instructor must plan his testing program in such a fashion 
that this set will be used during learning. 
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THE INFLUENCE OF THE TEST UPON THE NATURE 
OF MENTAL DECLINE AS A FUNCTION OF AGE! 


IRVING LORGE 
Institute of Educational Research, Teachers College, Columbia University 


The measurement of the relationship of age to ability is dependent 
largely upon the manner in which ability and age are defined. Age 
is generally defined as chronological age (and whatever it may involve 
in terms of physiology, education and experience). Ability is defined 
in two ways: (1) The level of difficulty of a task or a series of tasks 
that a person can do successfully, or (2) the number of tasks of equal 
difficulty that a person can complete successfully in a unit time. 
Regardless of definition, ability is measured usually by a series of 
tasks which are of varying difficulty, and which tasks are to be 
attempted within a unit time. The ability as measured is an undiffer- 
entiated mixture of power and speed; power representing the sheer 
ability to complete tasks successfully, and speed being a measure of 
the number of tasks that can be completed in a unit time. 

Assuming that age is measured by chronological age, then the 
relationship between age and ability will depend upon the ability and 
the test used to measure it. The fact that a high correlation exists 
between a power test of ability and a speed test of ability is not suffi- 
cient to indicate that each measures the same thing. It is only in 
the event that the correlation between power and speed tests of ability 
equals unity, that the equivalence of two measurements can be 
assumed. Although, in our studies, the IER Intelligence Scale CAVD, 
a test of power with unlimited time allowance, and the Otis Self- 
Administering Tests of Mental Ability (Higher Examination), a 
test of speed and power with a twenty-minute time allowance, are 
correlated .85, the relationship of each to age differs markedly. 

In order to determine the relationship between age and various 
tests of mental ability, eleven tests were administered to a group of 
adults ranging in age from twenty to over seventy years. The total 
population of one hundred forty-three took the following tests: 





1 Acknowledgment is hereby made of the services rendered by the personnel 
furnished by the Works Division, Emergency Relief Bureau of New York City on 
Project 89FB-125X. 

This study is part of a larger study in Interests, Attitudes, and Motives sup- 
ported in part by a grant from the Columbia University Council for Research in 
the Social Sciences. 
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1. IER Intelligence Scale CAVD (usually levels M to Q, although some were 
oriented at lower levels) [given with unlimited time allowance]. 


2. Army Group Examination Alpha, either Form 5 or Form 7 [standard 
timing]. 


3. Bregman Revision of Army Alpha Examination [standard timing]. 
4. Wells Revised Alpha Examination, Form 5 [standard timing]. 


6. Otis Self-Administering Tests of Mental Ability (Higher Examination), 
Form B [twenty-minute time allowance]. 


7. Thorndike Intelligence Examination for High-school Graduates. 


In addition, eighty of the total population took the following tests: 


5. Otis Self-Administering Tests of Mental Ability (Higher Examination) 
Form A [twenty-minute time allowance}. 
8. Pressey Senior Verification Test. 
9. Pressey Senior Classification Test. 
10. Psychological Corporation Test VI Form A. 
11. Psychological Corporation Test VI Form B. 


The usual order of administration of the tests was one, two, three, 
four, six and seven; then after a lapse of several months, five, eight, 
nine, ten, and eleven were given. Usually a subject took five alternate 
forms of (1) the IER Intelligence Scale CAVD before he was given 
the remainder of the testing battery. In the event that a subject 
had taken more than one form, his score was considered as the arith- 
metical mean of all of his performances. The IER Intelligence 
Scale was given without time restrictions—the subjects took from 
four to thirty hours to complete asingleform. Test (7), the Thorndike 
Intelligence Examination for High-school Graduates, was given in 
many forms. Usually the score used in this study was the mean of 
scores for five different forms of the test. 

In Table I are reported the means and standard deviations for 
two sub-groups and for the total population for each of the tests taken, 
and for age. In Table II are reported the coefficients of correlation 
between age and the various tests of intelligence for the sub-groups and 
for the total population. It is apparent that the relationship of 
ability to age varies as the test used to measure the ability. The 
penalty that age imposes upon a measure of ability can be estimated 
from the size of the correlation coefficients. The range of the correla- 
tion coefficients is from —.27 to — .48—the higher the negative correla- 
tion, the greater the penalty that age imposes upon the measurement. 
All other things being equal, the IER Intelligence Scale CAVD 
penalizes older persons less than the Otis Self-Administering Test A. 
The correlation between these last two tests is +.83. Yet the correla- 
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tions of each with age are —.27 and —.48 for CAVD and Otis Self- 
Administering A, respectively. The correlation of Army Group 
Examination Alpha with CAVD is +.87 and with age —.36. A high 
correlation between two measures of ability still allows for considerable 
variation in the relationship of each test to age. The sharp separation 
between the relationship of age and the IER Intelligence Scale CAVD, 
and age and each of the other tests seems corrborative of the fact that 
the CAVD is a measure of power, whereas the other tests of the battery 
measure a mixture of power and of speed. 

Tas_Le I.—Tue Means anv STANDARD DEVIATIONS FOR A BATTERY OF ELEVEN 


Tests oF INTELLECTUAL ABILITY AND FOR AGE BY SUB-GROUPS AND BY 
ToTraL PoPpuLATION 











Group J, n = 80 | Group II, n = 63} Total, n = 143 
Variable 
M SD M SD M SD 

re 408 .49 15.73 | 397.87 18.28 | 403.81 17.70 
2. Army Alpha....... 143.58 | 38.76 | 126.41 42.63 | 136.01 41.39 
3. Bregman Alpha....| 149.89 | 39.39 | 121.90 | 40.23 | 137.56 | 42.12 
4. Wells Alpha....... 149.48 | 37.99 | 125.05 | 44.79 | 138.71 42.88 
Uf? eee 48.31 12.77 
ss onc aeewes 41.85 15.24 | 33.13 15.11 38.01 15.79 
7. Thorndike H.S....| 64.30 19.86 | 54.08 19.72 | 59.80 | 20.44 
8. Verification........ 69.71 18.22 
9. Classification......| 67.38 | 20.00 
tS ee 114.46 | 34.03 
eS Peers 100.21 37 .63 
12. Age (months)...... 428.73 | 148.93 | 451.42 | 150.90 | 438.72 | 150.23 























TABLE I].—TuHE CoRRELATION BETWEEN EAcH OF THE TESTS OF INTELLECTUAL 
ABILITY AND AGE BY SUB-GROUPS AND BY ToTAL POPULATION 








Age and Group I, n = 80 | Group II, n = 63| Total, n = 143 
a: . — .2747 — .3338 — .3047 
2. Army Alpha....... — .3656 — .4463 — .4086 
3. Bregman Alpha..... — .3842 — .4793 — .4266 
4. Wells Alpha........ — .4198 — .5000 — .4586 
OS errr — .4858 
SO rrr ee — .4521 — .4753 — .4639 
7. Thorndike H.S..... — .4326 — .5170 — .4725 
8. Verification......... — .3787 
9. Classification........ — .2816 

10. Bureau A........... — .4398 
eS eee ere — .4805 
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The regression equation of each of the tests of mental ability and 
age would show the rate of decline of mental ability with age. Experi- 
mentally this rate of decline could be shown (perhaps more vividly) 
by equating three groups of different age-levels on the basis of a power 
test so as to determine the rate of decline on speed tests. In order to 
evaluate experimentally the rate of decline on speed tests, three groups 
of age-levels (1) between 20.0 and 25.0, (2) between 27.5 and 37.5, 
and (3) over 40.0 years were equated person by person on the basis 
of the CAVD scale. The matching was made from among the total 
sample of one hundred forty-three persons and yielded three groups 
each of twenty-three persons who were matched for CAVD, and who 
then could be compared for scores on the Army Group Examination 
Alpha, the Bregman Revision of Army Alpha, and the Wells Revision 
of Army Alpha, as well as for scores on the Otis Self-Administering 
Test B and on the Thorndike Intelligence Examination for High- 
school Graduates. In Table III are reported the means and standard 
deviations for each of the three matched groups for the six tests of 
mental ability and for age. The penalty of age is significantly a 
function of the test of mental ability. The three groups equated on 
the basis of sheer power to perform mental ability tasks reveal impor- 
tant differences on the basis of tests which measure an undifferentiated 
mixture of speed and of power. The penalty can be assayed by con- 
sidering the difference in score as a ratio to the difference in age. For 


instance, the penalty in Army Group Examination Alpha for age 


iain 149.61 — 142.26 
groups of equal mental ability power would be 384.04 — 274.04 and 


anr96 = rat which yields .0668 and .0632 Army Alpha points per 








month of chronological age. Averaging these two determinations 
gives a penalty of .065 Army Alpha points per month, or .780 Army 
Alpha points per year. If mental ability is measured in terms of Army 
Alpha, persons over twenty years of age would, other things being 
equal, be penalized approximately three quarters of a point for every 
year of chronological age beyond twenty. 

Jones and Conrad! have reported on mental decline with age of a 
homogeneous population. Using Army Alpha as the basis for the 
measurement of mental ability, they find that the rate of growth or 





1 Jones, H. E. and Conrad, H. S.: ‘‘The Growth and Decline of Intelligence: 
A Study of a Homogeneous Group between the Ages Ten and Sixty.”’ Genetic 
Psychology Monographs, Vol. XIII, No. 3, 1933, pp. 223-298. 
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decline ‘‘may be summarized as involving a linear growth to about 
sixteen years, with a negative acceleration beyond sixteen, to a peak 
between the ages of eighteen and twenty-one. A decline follows, 
which is much more gradual than the curve of growth, but which by 
age fifty-five involves a recession to the fourteen-year level.” This 


TasBLeE III.—Tue MeEans AND STANDARD DEVIATIONS FOR A BATTERY OF Six 
Tests oF INTELLECTUAL ABILITY AND FOR AGE BY THREE DIFFERENT 
Groups EQUATED ON THE Basis OF ALTITUDE SCORE ON THE IER 
INTELLIGENCE ScaLE CAVD 
































Means 
Otis SA 
Thorndike 
Group Age-range Age CAVD Army | Bregman | Wells (twenty High-school 
(months) Alpha Alpha Alpha| minutes) 
; B graduates 
I 20.0 to 25.0} 274.04 |405.25)149.61) 149.39 [158.35 44.39 66.92 
II 27.5 to 37.5| 384.04 |405.66)142.26) 149.13 [147.04 39. 26 60.28 
III over 40.0 604.70 |405.511128.70) 132.43 [129.83 33.39 53.03 
Standard deviations 
yO Meee eee 15.28 | 12.75) 32.66 27.61 32.28 12.86 14.81 
— a Neyer 34.05 | 13.19) 26.27 29.79 29.51 10.64 14.59 
ak . UGeeeewaaeneen 89.85 | 13.05) 32.33 31.36 31.31 12.29 16.91 




















imputed recession, in our belief, is not a loss of mental power as such, 
but rather an inability to work as fast with mental tasks. Yerkes! 
reports that the Army Group Examination Alpha sub-tests ‘“‘are 
neither principally ‘speed’ tests nor ‘power’ tests but tend to show 
the characteristics of a ‘power’ test more at the low levels than they 
do at the high levels.’’ He also states ‘‘In all tests but test two more 
than sixteen per cent are through in double time and are, therefore, 
scored too low” in single time. We agree with Brigham? when, in 
reference to the Army Alpha, he states “‘at least in our consideration 
of the army tests, we may definitely discard the opinion that we are 
testing speed rather than intelligence.”” Our agreement, however, is 
one of changed stress. The Army Alpha measures intelligence and 





1 Yerkes, R. M.: ‘‘Psychological Examining in the United States Army.” 
Memoirs of the National Academy of Sciences, Vol. XV, Part II, Chapter 9, 1921, 
pp. 415-420. 

2 Brigham, C. C.: Study of American Intelligence. Princeton University Press, 
1923, p. 12. 
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speed in amounts that are undifferentiated. It is not that the Army 
Alpha measures power intelligence or speed intelligence, but rather 
that the test reports a score which is a mixture of both. 

If the corrections determined from our matched group data were 
applied to the Jones and Conrad results, corrections from 0.39 to 
29.25 Army Alpha points would be added to the obtained average 
scores on different age-levels. The application of the correction is 
shown in Table IIIa. 


TasLE IIIa.—Tue APPLICATION OF A CORRECTION FOR THE PENALTY TuaT Is 
ImposED BY AGE ON MEASUREMENTS WITH THE ARMY GROUP EXAMINATION 
AupHa: Data or JONES AND CONRAD 








A Jones and Conrad —— tion of spe Adjusted Jones 
ities obtained mean saps. play lg and Conrad means 
age after 20.0 

19-21 100.7 0.39 101.0 
22-24 91.8 2.73 94.5} 
25-29 90.5 5.85 96.4! 
30-34 87.0 9.75 96.8! 
35-39 85.1 13.65 98.8} 
40-44 92.2 17.55 109.8 
45-49 80.7 21.45 102.2 
50-54 81.3 25.35 106.7 
55-59 78.6 29.25 107.9 














1 It is interesting to note the lower means for ages twenty-two to forty. These 
lower scores may be due to the selective migration of younger adults to urban 
centers for employment between ages twenty to forty-five. 


The adjusted Jones and Conrad means do not show the same 
gradual decline exhibited in their obtained scores. The correction 
for the penalty of age, indeed, may be a correction for slowness, for 
remoteness from school, for disutility of function, for lack of motiva- 
tion or for other physiological, educational or psychological changes. 
In mental decline, however, the power to cope with mental tasks must 
be considered freed from the influence of other factors or traits that 
may obscure it.. In our opinion, speed obscures sheer mental power 
in older adults. 


Miles and Miles! also have reported a study concerned with the 





1 Miles, C. C. and Miles, W. R.: ‘‘The Correlation of Intelligence Scores and 
Chronological Age from Early to Late Maturity.” American Journal of Psycho- 
logy, Vol. XLIV, 1932, pp. 44-78. 
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decline of mental ability with age of a group of adults in two com- 
munities. Using a form, specially printed in large type, of the first 
sixty items of the ‘Otis Self-Administering Test of Intelligence, 
Higher Examination Form A’”’ as a fifteen-minute test, they found 
that ‘‘From the high point in the intelligence score curve represented 
at or about eighteen years of age the trend is at first almost level, 
then gradually declines dropping fifteen or sixteen mental age months 
by the chronological age of fifty years.’’ Our results provide the data 
for the determination of the penalty of age on the Otis Self-Administer- 
ing Test of Mental Ability (Higher Examination) Form B, which is an 
alternate form of the curtailed examination used by Miles and Miles. 
Our population was given the Form B of the Otis on the twenty-minute 


time interval. For them the penalty of age may be estimated as 


44.39 — 39.26 44.39 — 33.39 
384.04 — 274.04 and as 604.70 — 274.04 °F 98 .0466 and .0333 Otis 








(twenty-minute) points per month. Using the average of these two 
estimates of age penalty, the probable correction should be .0400 Otis 
(twenty-minute) points per month, or .480 Otis (twenty-minute) 
points per year. By estimating the probable twenty-minute score 
from the fifteen-minute score of the Miles and Miles material from 
the data in their Table II, a table can be constructed to show the 
influence of the estimated correction for age and slowness on the mean 
score of the city B population. In Table IIIb are reported the adjust- 
ments for Miles and Miles scores. 

The correction for the penalty that a test of mental ability (which 
is an undifferentiated mixture of power and speed) places upon age 
changes the curve of mental decline to a curve of mental plateau, or 
even to a curve of mental growth. Mental growth in the later age 
brackets may be more the result of a special selection of death than 
of true growth. If this be true, an apparent rise in the curve of 
intellectual status would not be so much an indication of growth as 
it would be of the fact that the poorer members of the population 
were being eliminated, either by death or disease. Death, indeed, 
may come earlier to the mediocre. Thorndike et al.,' in a study 
of the persons who died or were seriously ill before age 22.0, state ‘“‘The 
general psychological theorem that all positive traits are correlated 
positively is here demonstrated in reverse; negative traits are correlated 





1 Thorndike, E. L., Bregman, E. O., Lorge, I., Metcalfe, Z. F., Robinson, E. E., 
and Woodyard, E.: Prediction of Vocational Success. Commonwealth Fund, 1934, 
284 + xviii, p. 91. | 
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negatively. Death, contrary to the widespread old wives’ tale, does 
not select the good or the gifted. Death comes earlier to the mediocre, 
to the inferior, to those not fully equipped for life’s battles.” 

TaBLe IIJb.—Tue AppLicATION OF A CORRELATION FOR THE PENALTY TuatT Is 


IMPOSED BY AGE ON MEASUREMENTS WITH THE OTIS SELF-ADMINISTERING 
Tests or MenTaAL Asititry (HiGHER ExaminaTION): Data or MILES AND 











MILEs 
aan ane eee Correction of Adjusted 
Age group | Obtained | Estimated peg eg Fascha 
waeeaees: — age after 20.0 | (20 minutes) 
(15 minutes) (20 minutes) 
15-19 38 .50 44.50 0 44.5 
20-24 38.10 44.10 1.2 45.3 
25-29 39.22 45.22 3.6 48.8 
30-34 35.26 40 .26 6.0 46.3 
35-39 35.06 40 .06 8.4 48.5 
40—44 33 .82 38.73 10.8 49.5 
45-49 34.50 39.50 13.2 §2.7 
50-54 30.98 34.98 15.6 50.6 
55-59 28.74 32.74 18.0 50.7 
60-64 27 .94 31.91 20.4 52.3 
65-69 24.22 27 .22 22.8 50.0 
70-74 23.78 26.78 25.2 51.2 
75-79 20.46 22 .69 27 .6 50.3 
80-84 14.50 16.50 30.0 46.5 
85-89 15.30 17.30 32.4 49.7 
90-94 15.30 17.30 34.8 §2.1 

















The correction applied to the Jones and Conrad data may be too 
large since in our material the Army Alpha average scores were one 
hundred forty-nine, one hundred forty-two, and one hundred twenty- 
eight, which are significantly higher than the high average of 100.7 
in their data. The correction, however, may also be too small, since 
the influence of age was left in the CAVD scores used to equate our 
matched groups. The correction applied to the Miles and Miles data 
may also be too small for the same reason (since their means were 
very much like ours). 

The influence of age may be subtracted from the CAVD scores by 
computing that part of the CAVD score which is freed from the 
influence of age. The regression equation gives the CAVD score 
that can be predicted from age. The difference between a person’s 
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obtained CAVD score and his regressed CAVD score from age gives a 
score for CAVD independent of age (the statistical symbolism for this 


expression is CAVD - Age = CAVD — CAVD,,,. Any person’s raw 
score may be considered as made up of two parts: One part for that 
amount of the CAVD which can be predicted from a regression 
equation of CAVD on age; the other part as the residual, or difference, 
between the amount of CAVD predicted from the regression equation 
and the obtained raw score. The residual represents the error made 
in estimating a person’s CAVD score from a knowledge of age alone. 
This error of estimate is a score which is independent of age from 
which the regressed score was predicted. For each of the one hundred 
forty-three persons in our material we computed a CAVD score inde- 
pendent of age (CAVD- Age); that is, we computed the residual 
between a person’s raw score and his CAVD score predicted from age. 
With these age-independent-CAVD scores as criteria, we matched 
three new groups in the age ranges: 20.0 to 25.0, 27.5 to 37.5, and, 
over 40.0 years. These three groups were matched, person by person, 
on the basis of the CAVD: Age score; that is, matched on the basis 
of the residuals. In Table IV are reported the data resulting from 
this new matching on the basis of an altitude score, freed from the 


influence of age. The correction for the Army Alpha may be estimated 


159.22 — 140.65 159.22 — 124.52 
from 39599 — 271.39 2nd from 603.50 — 271.39 28 -1539 and .1045 








Army Alpha points per year. Averaging these two estimates gives 
a correction of .1292 Army Alpha points per month, or 1.5504 Army 
Alpha points per year, which contrasts sharply with the correction 
of .780 Army Alpha points per year estimated from groups equated 
on the basis of an altitude score in which the influence of age had 
been left in the score. The correction for the Otis Self-Administering 


Test of Mental Ability (Higher Examination) may be estimated from 


49.74 — 39.17 49.74 — 31.65 
392.09 — 271.39 and from 603.50 — 27139 28 .0876 and .0545 Otis 








(twenty-minute) points per month. Averaging the two estimates 
gives a correction of .0716 Otis (twenty-minute) points per month, 
or .8592 Otis (twenty-minute) points per year, which contrasts sharply 
with the correction of .480 Otis (twenty-minute) points per year 
estimated on the basis of scores from the group equated for altitude 
with age left in the score. 

By any reasonable evaluation of the estimated correction for the 
penalty that age imposes upon persons by a test which is an undiffer- 
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entiated mixture of power and speed, it is clear that reported curves 
of mental decline with age are exaggerated. While it is recognized 
that a follow-up of mental power scores of individuals from a time 
shortly after birth through senescence, is the only technique for a 
definitive answer to the actual measurement of the relationship of 
mental ability to age, the results reported in this paper are suggestive 
in cautioning against the imputation of loss of mental power as a func- 
tion of advancing age, especially when such inference is based upon 
scores from tests that are a mixture of power and speed functions. 


TasLe IV.—Tue MEANS AND STANDARD DEVIATIONS FOR A BATTERY OF SEVEN 
Tests oF INTELLECTUAL ABILITY AND FOR AGE BY THREE DIFFERENT 
AGE-GRoUPS EQUATED ON THE Basis OF AN ALTITUDE ScorE FREED 
FROM THE INFLUENCE OF AGE (THE IER INTELLIGENCE SCALE 
CAVD Score-aGe) 


























Means 
CAVD Bres- Otis SA | Thorndike 
Age Army Wells| (twenty : 
Group Age-range Gnentind Age |CAVD Alpha| ™®” | alpha minutes) high-school 
+100 P Alpha P B graduates 
I 20.0 to 25.0} 271.39 |101.68/411.50/159.22)159.78/165.09 49.74 72.42 
II 27.5 to 37.5} 392.09 |101.69/407.00/140.65) 147. 65/149. 43 39.17 63.50 
III over 40.0 603.50 |101.68/400.38)124.52)125.78)123.65 31.65 50.82 
Standard deviations 
o . (ite eamenaaud 15.71 | 14.24) 14.08) 27.78) 24.63) 24.33 13.57 15.68 
eS rere rr 40.27 | 14.00) 14.34) 31.07) 32.44) 32.86 11.36 17.06 
es 95.96 | 14.65) 14.17) 38.07) 35.30) 40.94 13.51 17.58 
































The facts concerning the slowing up of reaction time and of 
coérdination speed with advancing age, reported by Miles,’ and by 
Bellis,? showed that simple and complex reaction times are slower 
in older adults than in younger adults. Ruger,’ in an elaborate and 
painstaking analysis of the measurements collected by Sir Francis 





1 Miles, W. R.: “‘Correlation of Reaction and Coordination Speed with Adults.” 
American Journal of Psychology, Vol. XLIII, 1931, pp. 377-391. 

? Bellis, C. J.: ‘‘Reaction Time and Chronological Age.’”’ Proceedings of the 
Society for the Experimental Study of Biology and Medicine, Vol. CCCII, 1933, pp. 
801-803. 

* Ruger, H. A. with the assistance of Stoessiger, B.: ‘‘On the Growth Curves of 
Certain Characters in Man (Males).” Annals of Eugenics, Vol. II, Parts 1 and 2, 
1927, pp. 76-110. 
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Galton in his first anthropometric laboratory at the Health Exhibition 
in South Kensington in 1884, finds that the dynamic or motor char- 
acters (strength of pull, hand grip, swiftness of blow and vital capacity) 
show “‘a rapid rise in childhood, a pubescent dip at about fifteen years, 
& prime or maximum at about twenty-four or twenty-five years, and 
senescent decadence following the maximum.” Further, for sensory 
acuity (vision and limit of audible pitch) he shows that decadence 
begins earlier than for motor characters. 

It may be true that decadence of any one character may not 
significantly hamper performance in a test of mental ability which is 
a mixture of speed and of power. Yet decline in visual acuity, in 
auditory acuity, in strength, in speed of reaction, in speed of coérdina- 
tion as well as disuse of function, remoteness from school, and pre- 
occupation with life’s problems cumulatively must have a retarding 
influence upon the performance of an older individual in a test of 
mental ability which partakes of aspects of both power and speed.! 

Our study demonstrates that the reported facts of mental decline 
as a concomitant of age are, at the least, exaggerated. The power to 
do mental tasks or to solve those of life’s problems which must be 
approached mentally, probably does not deteriorate as a function of 
age. The reported deterioration is more apparent than genuine. It 
lacks genuineness in the sense that the test used to measure mental 
ability is not a genuine measure of mental power. Contaminating 
power with speed measurements among older adults obscures the true 
relationship of intellectual power to age. The inference of mental 
decline is an unfortunate libel upon adults. 





1 The problem of the relationship of speed of reaction to power in mental ability 
may be reopened in the field of the older adult. Speed of reaction is on the decline 
in adults after age thirty. Earlier in the life-span growth, decline and plateau may 
be the status of different individuals in a sample. The mixture of the three phases 
may obscure the true relationships of reaction speed to mental ability. It may be 
that a more thorough understanding of the nature of intelligence will be obtained 
by considering relationships among adults rather than among children or youths. 








AN EXPERIMENTAL STUDY OF THE EFFECT ON 
LEARNING OF SUPERVISED AND UNSUPERVISED 
STUDY AMONG COLLEGE FRESHMEN 


JOHN E. WINTER 
West Virginia University 


The experiment herein discussed is one phase of a more extensive 
study of efficiency in the educational system of West Virginia Uni- 
versity. State universities generally are handicapped more than 
private institutions in their choice of incoming students by the fact 
that they are required to accept all applicants from approved high 
schools within the State. As the output of even the approved high 
schools represents a wide variation in mental capacity and scholastic 
attainment, the State university is perennially confronted with the 
problem of a heavy mortality among freshmen students. For years 
it had been the policy at West Virginia University to dismiss either 
at mid-semesters, or at the close of the semester, all students who 
failed to make a passing grade in at least fifty per cent of their work. 
So large a proportion of those dismissed were freshmen, especially 
first-semester freshmen, that the administration adopted the plan of 
automatically reinstating freshman failures, thus retaining throughout 
the first year all freshmen who chose to remain. 

With the problem of freshman failures thus thrust upon us, an 
attempt was made to salvage as many as possible. This task did 
not appear prepossessing in view of the results obtained by various 
experimenters at other institutions. 

Pressey’s! conclusion, for example, is that it is not worthwhile to 
train students below the twenty-fifth percentile in intelligence. Bird’s* 
conclusion is almost identical with Pressey’s, and Potthoff* found 
that out of thirty-four freshmen whose average grade was below 
passing only three graduated. 

The problem we undertook to solve was twofold: To ascertain (1) 
to what extent the heavy mortality among freshmen can be avoided; 
(2) to what extent high-school grades combined with a freshman 
intelligence test are a true index of the student’s capacity. 





1 Pressey, L. C.: School and Society, Vol. XXVIII, 1928, pp. 403-404. 
? Bird, C. B.: Effective Study Habits. The Century Co., New York, p. 204. 
* Pothoff, E. F.: School and Society, Vol. XX XIII, 1931, pp. 203-204. 
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METHOD OF THE EXPERIMENT 


At the opening of the Fall semester all freshmen were required to 
take the American Council intelligence test. Those whose percentile 
ranks were twenty or below were chosen for the experiment, except 
that those among these two lowest deciles whose high-school average 
was fair or better were exempted. 

This gave us an experimental group of about sixty students. 
For our control groups we chose students of equally low rank from 
two preceding freshmen classes as well as a group from the class 
containing the experimental group who were prevented byschedule 
difficulties from joining the experimental class. All of the students 
chosen were in the Arts and Science College. 

The experimental group met three afternoons a week from two 
to five o’clock. Two hours each week were devoted to a discussion 
of the principles of study; the rest of the time was spent in the study 
of their regular courses. Once a week various instructors representing 
the sciences, foreign languages, history, and English, would meet 
with the students enrolled in these courses, at which time the student 
could bring up any special difficulties he was encountering. The 
function of the instructor in each case was to advise the student, not 
to coach or drill him. 

This method of supervising the study of these students was 
adopted because we believe that one of the most prolific causes of 
failure lies in the natural resistance of the student to getting down 
to work. Group study offers an added impetus which we have found 
to be effective. To make sure that the students actually did study, 
their work for the three-hour period was periodically checked. 

The following tables! show comparisons between the grades of 
the 1930 experimental group and those of 1928, 1929, and 1930 control 
groups. In each case the number of students in the experimental 
and control groups is equal, and each student in the experimental 
group is matched as nearly as possible with a student from the control 
group having the same percentile rank. All the students are taken 
from the two lowest deciles of their respective classes, and only those 
students were chosen whose high-school record showed an average 
grade below C, or fair. 





1 The data for Tables I to V were prepared by Miss Martha Fulton, a Univer- 
sity instructor, who was in charge of the study groups. 
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Table I shows the distribution of the average grades of twenty-six 
of the 1930 experimental group and a like number of the 1930 control 
group at midsemesters. Table II shows the distribution of the 


TaBLE I.—DIsTRIBUTION OF AVERAGE MIDSEMESTER GRADES 
Twenty-six Freshmen from How-to-study Group—1930 
Twenty-six Freshmen from Control Group—1930 











Percentile PF E D C 
rank 
.199-.150 | How to study............... 2 4 
ti eke ee eaewen i ” 6 2 
.149-.100 | How to study............... me 1 4 
aie ie du ddim ates 69% on as l 2 
.099-.050 | How to study............... - 1 1 8 
Als i meh 4000s cee ea 2 4 4 
.049—.000 | How to study............... 4 i 5 
I 5 ch ug etn h-6 open Sa l 2 1 1 
Total Re IE. vc ad cwce seca i 2 8 16 
ND 554-60 tne ceed 1 4 12 9 




















TaBLeE II].—DisTrisuTION OF SEMESTER AVERAGES 
Twenty-five Freshmen from How-to-study Group—1930 
Twenty-five Freshmen from Control Group—1930 











Percentile P E D C 
ranks 
.199—. 150 | How to study............... 2 4 
ene dW al tae 3 ou i 2 3 3 
.149—.100 | How to study............... se 1 2 2 
Eo ee oa 2 1 
.099-.050 | How to study............... 1 2 6 
a 4 3 2 
.049—.000 | How to study............... 2 3 
EES ee ee ee 3 2 
Total Re ES Sevceeeeuae wha oe 2 8 15 
Ne ete tgs PU eas 3 6 10 




















average grades of twenty-five of each of the groups in Table I at the 
end of the semester. Table III is a similar distribution of semester 
averages of thirty-one students of the 1930 experimental group and a 
like number of the 1929 control group. Table IV is a similar dis- 
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tribution of semester averages of thirty-nine students of the 1930 
experimental group and thirty-four of the 1928 control groups. Five 
members of this control group failed to complete the semester. Table 


TaBLe IIJ.—SemesterR AVERAGES 
Thirty-one Freshmen from How-to-study Group—1930 
Thirty-one Freshmen from Control Group—1929 











Percentile PF E D C B 
ranks 
.199-.150 | How to study................ sid —- 2 3 
ae oe eas ahd % 2 1 1 
.149-.100 | How to study................ 1 1 3 3 
iit a ia 8c le a ae 7 2 
.099-.050 | How to study................] .. 2 2 6 
EC hiehieis oa ¢k-k}-edae ¥ oso 1 pe 6 5 1 
.049-.000 | How to study................ 1 2 5 
ed de hig Waa nae 2 3 
Total Ec cidasdws daieen 1 4 9 17 
Control..... Prereer TT eee 1 4 17 8 























TaBLE I1V.—SEMESTER AVERAGES OF FRESHMEN 


Thirty-nine Freshmen from How-to-study Group—1930 


Thirty-four Freshmen from Control Group—1928 











Percentile PF E D C 
rank 
.199—.150 | How to study............... = 3 3 
ES a ee ee eee e 2 7 2 
.149—.100 | How to study............... 1 1 2 3 
ihn spk a os Naa ae ae bs 2 3 
.099—.050 | How to study............... 2 2 7 
SR opts 2 4 
.049—.000 | How to study............... 2 7 6 
IES SEEREAB EE Seoae cares arene 3 6 2 
Total PE NMI. on cccceccuaces 1 5 14 19 
ES ae 7 20 7 




















V shows the distribution of grades (not averages) of twenty-five 
students from the 1930 experimental group and a like number from 
the 1930 control group. Tables I to V indicate that the grades of 
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the experimental group are in general superior to those of the control 
groups. 

Table VI shows the total number in each group making averages 
below C and the number making C or above. There is a noticeable 
contrast between this table and the following one from Pressey which 
represents the number of saved and lost in the lowest quartile of her 


TaBLE V.—DiIsTRIBUTION OF GRADES 
Twenty-five Freshmen from How-to-study group—1930 
Twenty-five Freshmen from Control Group—1930 

















F E D C B A 
kag hah er ek bw ee nde 14 3 31 50 46 4 
i Gntcus ss bene sataneceunaa 36 13 35 39 19 7 














paired trained and control groups. 
on her lowest quartile. 


Special training had little effect 
Her trained group has 20.8 per cent saved 


compared with 12.5 per cent of her control group. Our results for 
the lowest two deciles show 53.6 per cent with C or above for the 
trained group, and 23.1 per cent for the control group. The contrast 


TasBLeE VI.—Torat AvERAGES BELOW C, aND C or ABOVE 




















Below C | Percent |Corabove| Per cent 
How-to-study group............ 44 46.3+ 51 53 .6+ 
gids a as 6 culaawee 68! 76.8+ 22 23.1+ 





1 To the sixty-eight of the Control group representing grades below C should be 
added five more students who failed to complete the semester. 


PRESSEY’s TABLE 








Lost Per cent Saved Per cent 
0 ESET eT eee ee 19 79.1 5 20.8 
Sd ed ban auke wanes 21 87.5 3 12.5 

















is greater than the figures indicate since Pressey’s figures cover the 
lowest quartile, whereas ours cover only the lowest two deciles. 
Table VII shows the number of honor points received by each 


of the groups. 


Grades A, B, C, D have values of three, two, one, 


and zero points respectively for each hour of credit. The experimental 


group has 51.5 per cent more honor points than the control group. 
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But while there is a noticeable difference in the achievement of the 
two groups at the end of the first semester in the University, the 
question still remains as to the ultimate evaluation of the How-to- 
study course. Did this course prove to be of permanent value or 
did it produce merely a temporary spurt in achievement? To answer 
this question the scholastic records of both the experimental and 
control groups of the 1930 class were followed for a period of four 
years (1931-1934). Tables VIII and IX give the results. 


TaBLE VII.—Honor Points REcEIVED 











Year How-to-study group Control group 
1928 . 414 260 
1929 348 268 
1930 308 178 
Total 1070 706 











Table VIII indicates that the average number of semesters attended 
by the experimental group is only .84 in excess of that of the control 
group, and the average number of hours passed by the experimental 
group is only 9.12 hours in excess of the control group. 


TaBLeE VIII.—AcHIEVEMENT OF THE 1930 SupEerRvisepD Stupy aNpD CONTROL 
GROUPS AT THE END OF THE FourTH YEAR (1934) 











Average number Range in | Average number 
semesters in 
; : semesters| hours passed 
university . 
How to study (41).............. 4.75 1-7 54.97 
TT CTT CTE ee err e 3.91 1-7 45.85 
hie pi ite shaw wg an .84 -F 9.12 














Table IX indicates that the difference in the distribution of grades 
for the two groups is negligible. A comparison of individual records 
shows that nine of the experimental group (21.9 per cent), and six 
of the control group (17.6 per cent) did normal college work of sixteen 
hours or more per semester for four or more semesters. It would 
appear from these figures that the course in How-to-study produced a 
temporary salutary effect on the students’ achievement, but that 
over a period of years it can hardly be said to have justified itself. 
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The results of this experiment raise another question, namely; 
the proper criteria for predicting freshmen scholarship. The tendency 
seems to be to regard high-school grades as a more reliable index than 
an intelligence test. Among the more recent studies Odell! concludes 
that high-school grades and intelligence tests have about equal 
predictive value, but that the usual coefficient of correlation of these 
two factors, from .40 to .60, indicates that prediction of freshmen 
grades are not much better than pure guesses. However, he con- 
cludes that predictions are more reliable in the case of high scores than 
low scores. Our figures show that, while all the students used in 
the experiment were taken from the lowest two deciles, and all had 
high-school averages below C, or Fair, 53.6 per cent made first semester 


TaBLeE 1X.—AVERAGE Hours ror Eacun GRADE 























A B C D E F 
OD csv cenknenn ann den 1.24 | 10.24 | 26.12 | 17.02 | 2.87 | 7.39 
CE t0 cu kwdiewesonee cane’ 1.11 9.02 | 19.26 | 15.85 | 1.91 | 7.61 








freshman averages of C or above. Our results tend to confirm, there- 
fore, Odell’s conclusion in so far as it pertains to students with low 
high-school grades. 

Pothoff found that the coefficient of correlation between two- 
year college grades and intelligence was .435; between two-year 
college grades and high-school grades was .620; and between two year 
college grades and first-quarter college grades was .810. He thus 
concludes that the first quarter average is far more accurate in pre- 
dicting success than either intelligence scores or high-school averages. 
His multiple coefficient of correlation for all three criteria was .842, 
indicating that the predictive accuracy of the first quarter is increased 
but little by the addition of high-school average and intelligence 


score to the first-quarter average. Our results tend to confirm 
Pothoff’s conclusions. 


CONCLUSIONS 


1. The course in supervised study was most effective among 
those students in the lowest decile and a half. 





1Qdell, C. W.: Predicting Scholastic Success of College Freshmen. Univ. of 
Illinois Bureau of Research, 1927, No. 37. 
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2. Contrary to Pressey and others it appears from our results 
that the trained group on the whole showed decided superiority over 
the untrained group in grades at the close of the first semester. 

3. Neither intelligence tests nor high-school grades are a reliable 
criterion of a student’s capacity, so far as the lowest two deciles are 
concerned. 

4. In view of the fact that approximately twenty per cent of the 
students in the lowest two deciles did satisfactory work for four or more 
semesters, it is apparent that the true status of a student’s scholastic 
ability cannot be determined until he has had ample opportunity to 
profit by what the University has to offer. 

5. The course in How-to-study produced a temporary salutary 
effect, but was of negligible permanent value. 








SINISTRAL AND MIXED MANUAL-OCULAR BEHAVIOR 
IN READING DISABILITY 


PAUL A. WITTY AND DAVID KOPEL 


Northwestern University 


In the literature upon reading disability there has appeared a 
number of studies purporting to show an etiological relationship 
between certain conditions of laterality (sidedness) and poor reading. 
The ‘‘frequent association” of left-handedness and dyslexia has been 
emphasized particularly by Dearborn;** similar observations have 
been made by Anderson and Kelley! and Hincks.” Left-eyedness 
has been associated with poor reading by Gates and Bennett,'* Mon- 
roe,* and Stromberg. And Dearborn’* has hypothesized a causal 
relationship between mixed hand-eye dominance (left-handedness and 
right-eyedness or right-handedness and left-eyedness) and legasthenia. 
Orton’s theory of reading disability,*!:*? couched in neurological terms, 
is essentially similar to that of Dearborn.® 

The present study was undertaken, in part,{ to investigate the rela- 
tionship between ability in reading and various conditions of laterality. 
Although partial results of the study have been set forth elsewhere, *: 
additional data will be presented here regardifig the methods and 
instruments employed in measuring laterality. 


SELECTION AND DESCRIPTION OF SUBJECTS 


The experimental group consisted of one hundred children of IQ 
eighty or above whose reading scores upon the Metropolitan Achieve- 
ment Tests were the lowest—one semester or more below their grade 
norms—among those of two thousand children in grades three to six 
inclusive of the Evanston Public Schools (District 75); the controls 
were normal readers of IQ eighty or above whose reading scores upon 
the Metropolitan Test were equivalent to or above their grade norms. 

Data concerning the two groups are presented in Table I. There 
are sixty-six boys and thirty-four girls in the problem group of poor 
readers. Comparable numbers of boys and girls are in the control 
group, and the two groups contain proportional numbers of children 





* In recent articles Dearborn*®*!° has modified his earlier conclusions. 

t In the writers’ comprehensive study of, and remedial endeavor with, one group 
of poor readers, attention was directed to health status, mental and sensory effi- 
ciency, emotional adjustment, interest and play behavior, and educational 
achievement. 
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from the same grades and schools. The average IQ of the problem 
children is ninety-six; of the control, one hundred four. The average 
chronological age of the problem children is ten years, four months; 
of the control, nine years, two months. Age-grade distributions 
show the problem children to be retarded one semester in 3B; the 
retardation increases to one grade in 5B. On the average, the problem 
group is one year older than the non-problem. 


TaBLE I.—DESCRIPTION OF PROBLEMS AND NON-PROBLEMS 








Average | Average 
Grade ; . 
Average | Average | reading | reading 
N place- ; , 
pence age IQ achieve- | achieve- 
ment I | ment II 
1 2 3 4 5 6 7 
ae 100 3B-6B 10-4 96 — .96 —9.9 
Non-problem ..... 80 3B-6B 9-2 104 + .39 +1.1 























1. Problems: Children one semester or more below their grade-norm in reading, 
on the Metropolitan Achievement Test. 


Non-problems: Controls, matched for school, grade, sex; reading ability equiva- 
lent to or above grade-norms. 


4. Age range—problems: 7-4 to 14-7; non-problems: 7-4 to 12-4. 
5. Kuhiman-Anderson Intelligence Tests. Range for problems: eighty to 
one hundred sixteen; for non-problems: eighty-three to one hundred twenty-three. 


6. Figures, in grades, represent grade placement minus grade equivalent in 
reading (October 1934). 


7. Figures, in months, represent mental age minus reading age (October 1934). 


Reading retardation of .67 grade in 3B increases to 1.1 in 5B, with 
the average retardation about one grade for the problem children. 
For the non-problem children, the average acceleration in reading is 
about one-half grade. Thus, the disparity between the groups is 
approximately one and one-half grades. (These averages from the 
Metropolitan tests were corroborated by the Gates Silent Reading 
Tests.) The reading achievement of the non-problems is in general 
consonant with mental age, although their achievement is slightly 
below mental age in grades three and four, and slightly above in grades 
five and six. 


THE MEANING OF LATERALITY 


Laterality is a generic term; it refers to bodily behavior char- 
acterized by the unilateral preference of the external bipartite organs. 
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The type most commonly observed is that exhibited in the preponder- 
ant use of one hand. Individuals are popularly designated as right- 
or left-handed, or ambidextrous—in so far as they characteristically 
use the right or the left hand—or (in a few cases) both hands with 
equal frequency. Similarly, individuals display in their vision pref- 
erences for the right or the left eye—or occasionally for both eyes with 
equal frequency. 

Most studies of laterality have investigated the nature and inci- 
dence of handedness and eyedness, separately and in relation to each 
other. Considerable knowledge about these phenomena already has 
accumulated; and more reliable techniques for the measurement of 
eyedness and handedness are available than for other phases of 
laterality. 

Only in a few very recent investigations of laterality are data on 
foot preference presented. The “preferred side” of the body also 
has been studied, but satisfactory criteria for this function are not 
available. Although ear dominance is mentioned by a few investiga- 
tors, the concept is of doubtful validity, since little ostensible motor 
behavior is involved, and relative acuity is frequently mistaken for 
dominance. * 

Underlying most discussions of laterality is the assumption, explicit 
or tacit, that handedness, eyedness, and so forth, are manifestations 
reflecting hemispherical cerebral dominance. Evidence has _ been 
adduced recently which suggests that cerebral dominance is related 
to the side which is most used and is therefore a secondary character- 
istic rather than a primary one.* Furthermore, “there is an indica- 
tion that a consistent dominance does not exist and that eye, ear and 
body dominance does not necessarily follow to establish a theory of 
complete cortical dominance.’’*? Indeed, the validity of the concept 
of cortical dominance itself has been challenged. Kelly?! concludes 
that: “There is no known check on cerebral dominance which is 
sufficiently dependable to enable one to investigate the influence of 
that factor on the perception of the orientation of symbols.” + Bethe? 
believes the relationship between cerebral dominance and reading 





* This error is exemplified in the writing of Twitmyer and Nathanson.* . . . 
Many investigators have noted the lack of relationship between visual acuity and 
visual dominance.'* J. M. Smith*”? and others have demonstrated the sensory 
pre-eminence of the non-preferred hand. 

t Orton’s** major premise in his strephosymbolic theory thus appears to be 
questionable, at least. 
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proficiency is not marked. And Mintz?’ has found that feeble-minded 
children of unstable unilateral cerebral dominance exhibit no greater 
reversal tendency than do other feeble-minded children of the same 
reading level. Finally, Kirk’s study”*® suggests that ‘‘the inference 
of a dominant hemisphere, based on the exhibition of a preferred hand, 
is unfounded.” 

Since, “In the practical clinical situation it matters little whether 
one describes a case of poor reading in terms of cerebral dominance or 
ocular and manual dominance.’’*! this study will not investigate the 
factor of cerebral dominance.* Only the methods employed in meas- 
uring handedness and eyedness and their relation to poor and to good 
reading will be treated in the following sections. 


THE MEASUREMENT OF HANDEDNESS 


Of many handedness investigations, the comprehensive study of 
Helen Koch* includes perhaps the most useful method for measuring 
manual behavior.t| Koch noted the manual choices of her subjects 
in one hundred five standardized situations; she then demonstrated 
the feasibility of substituting a questionnaire for performance tests. 

The writers concluded that the Koch questionnaire had apparent 
limitations for use with young children. It was decided, therefore, 
to select the items which were found by Koch to differentiate best 
between right- and left-handed groups and to adapt these to the 
understanding levels of children. It was hoped that the tendency 
toward lowered reliability resulting from use of a shortened form of 
the testing instrument, would be compensated for by the increased 
validity accruing from the greater predictive value of the items 
selected. 

Koch presents data showing the degree to which each of the one 
hundred five questions differentiated right- from left-handed indi- 
viduals. In Table 13 of her study one finds that the predictive value 
of the items varies greatly; 7.e., from 5.9 to one hundred per cent of 
the individuals in the “right-handed group”’ made “‘right’”’ responses 
on individual items. Thus in some situations few right-handed 





* Some data pertaining to foot, ear, and side preference were collected in this 
study and are to be reported later. 

+ The device developed by Van Riper“ has serious limitations (recognized by 
the author). Durost’s ‘Criterion Questionnaire’”’!* consists of only ten items, 
one of which—‘‘2’’— proved very unreliable in Koch’s study; two other items 
—‘8” and ‘‘10’— appear somewhat lacking in validity. 
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individuals exhibited dextral behavior and, in others, no left-handed 
persons exhibited sinistral behavior. Further scrutiny of the table 
shows that certain items produced a preponderance of dextral responses 
in both the right- and left-handed (part III: 7, 15, 18; part IVA: 22, 
24; part IVB: 3b, 5a, 6a, 7a, 8b, 15b, 17a). 

Since the validity of many items appeared questionable, it was 
decided to eliminate those which Koch found did not differentiate 
sharply and reliably. In devising a questionnaire to be used with 
school children items were selected upon which at least eight per cent 
of the right-handed and not more than twenty per cent of the left- 
handed made dextral responses (part I: 10; II: 7, 8, 10; III: 4, 5, 6, 
9, 10, 12, 20; IVA: 1, 2, 3, 4, 11, 13, 20, 21, 25: IVB: 9a, 9b, 20b and 
25b). These were eliminated after children’s responses had been 
studied in the Northwestern University Psycho-Educational Clinic. 
Three items (part I: 5; IVB: 12b, 23b), having scores somewhat lower 
than those required by the criterion, were added—two because they 
represented situations that had been used frequently in other investiga- 
tions, and the third, because it was thought desirable to include at 
least one question to which the proper dextral answer would be “‘left”’ 
and the proper sinistral answer, “right.” 

Finally, twenty-two items were arranged in a new questionnaire. * 
The complex sentence structure of some items was simplified; all 
sentences were re-worded in terms of children’s vocabularies—sub- 
stitutions were made from the first three thousand words of Thorndike’s 
Word Book for words not contained in this number. The resultant 
questionnaire,{ it was believed, would provide a valid and reliable 
means of measuring degree of dextrality in elementary-school children. 

The questionnaire (administered individually by the teacher) was 
scored in terms of the percentage of dextral responses.{ On the 
assumption that dextrality ranges from nothing to one hundred 
degrees, handedness indexes reflecting the percentage or degree of 
right-handedness were computed. The following formula, similar 


E 
Rt+s 


to one used by Van Riper,** was employed: Handedness index = N 








* Van Riper’s*‘ modification of the Koch questionnaire contains twenty-nine 
items, selected in a somewhat different manner. 

+ The questionnaire may be obtained upon request. 

t One item, it will be recalled, differs from the others in that a ‘‘left”’ response 
was considered indicative of dextrality. 
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where & and E equal the number of items in which the use of the right 
hand and either hand—items answered R and L or items unanswered 
—were indicated, respectively, and N equals the total number of items. 
Thus an individual who encircled all the L’s would earn an index of 
zero, while one who encircled all the R’s would have an index of 
one hundred.* 

The distribution of the scores or indexes which is presented in 
Table II suggested a five-fold classification of handedness: Right; 
right, with left-handed tendencies; ambidextrous; left, with right- 
handed tendencies; and left. 

This arrangement avoids the cumbersome, manifold divisions of 
Rife,** Ojemann,”** and Downey," as well as the over-simplification 
of other investigators who have employed a bipartite or a tripartite 
classification. (It should be noted that the categories are in agree- 
ment with the suggestion made by Cuff® for the classification of eyed- 
ness—a somewhat analogous phenomenon.) 

Table II contains the percentages and range of indexes in each 
of the five handedness groups. Although “the method of measuring 
strength of preference profoundly influences the results obtained,’’* 
the findings given in this paper are similar to those reported by other 
investigators. f 


LEFT-HANDEDNESS AND READING 


Attention may now be directed to the relative size of the ‘‘R,”’ 
“Ry,” “A,” “Le,” and “L” handedness classes in the problem and 
non-problem groups. One is immediately impressed with the high 
degree of correspondence displayed by the percentages of problems 
and non-problems in the several categories. Of the former, only 
three per cent are ambidextrous; of the latter, five per cent. In the 
Rx, category, problems have eight, non-problems, fifteen per cent; 
and in the Lez group, problems have eight, non-problems, three per 
cent. Very interesting is the fact that in comparison with the con- 
trols, the problems exhibit a somewhat higher proportion of right- 
handedness and a slightly lower incidence of left-handedness. Adding 





* Indexes were checked by dynamometer responses and by verbal reports. 

{ Selzer* presents a list of the percentages of left-handedness reported in 
various studies. Smith,* Downey,!? and Jastak™” give additional figures. A 
critical discussion of statistical incidences of right- and left-handedness is reported 
in Wile.“ 
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the Lz to the L groups changes the ratio of sinistrality in problems and 
non-problems only from 6:8 to 14:11, surely not a significant differ- 
ence in either case. This finding tends to negate the assertions of 
various writers (named above) that reading disability is frequently 
associated with left-handedness. Corroborating the present finding 
is the work of Gates and Bennett,'* Hildreth,’* and Haefner.” 


THE MEASUREMENT OF EYEDNESS 


Although there are many devices for ascertaining which eye is 
dominant, a satisfactory method for the measurement of degree of 
eyedness has not been developed. Crider* has recently shown the 
inadequacy of Lund’s** monoptometer and of Cuff’s’ manoptometer— 
rather elaborate instruments which purport to measure direction and 
degree of ocular dominance. 

In the present study eyedness was investigated with three com- 
monly used devices: (1) The manoptoscope, (2) the paper-hole test, 
and (3) the finger-object test. All are described by Scheidemann ;* 
the procedure recommended by her was followed with “‘2” and ‘3.” 
In ‘1,” because of the unsatisfactory directions given by Parson** 
for his manoptoscope, the following procedure which had been found 
useful in clinical practice, was substituted; the subject was asked (a) 
to stand erect with heels together, (b) to hold the manoptoscope with 
both hands, the center seam between his thumbs, (c) to look through 
the manoptoscope, with both eyes open, at the examiner’s face— 
twenty feet distant. 
| All children were given six trials with each test. When any doubt 
occurred as to the accuracy of the results—because of obvious bias 
attributable to the child’s hand or body position—additional trials 
were given. For classifying the results a scheme paralleling that used 
for handedness was adopted. The eyedness test results were easily 
divided into five groups. 

R = Right—88.9 per cent to 100 per cent, or sixteen to eighteen 

trials ‘‘right’’ on three tests. 
R, = Right, with left-eyed tendencies—61.1 per cent to 83.3 per 
cent, or eleven to fifteen trials “right.” 

A = Ambi- or impartial-eyed—44.4 per cent to 55.5 per cent, or 

eight to ten trials ‘“right.”’ 

Le = Left, with right-eyed tendencies—16.7 per cent to 38.9 per 

cent, or three to seven trials “right.” 

L = Left—0 per cent to 11.1 per cent, or none to two trials “right.” 
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Taste IIJ.—Hanp anp Eve PREFERENCES—PERCENTAGE 














R Rt A Lr L Total N 
Hand: 
GS 5 5s kes Od aie 75 8 3 8 6 100 100 
Non-problem........... 69 15 5 3 8 100 73 
tn hits a eee he been ad 72 11 4 6 7 100 173 
Eye: 
a 52 8 5 5 30 100 100 
Non-problem........... 54 13 3 i) 22 101 78 
Ee ee 53 10 4 7 26 100 178 


























Hand: R = Right—86.4 per cent to 100 per cent, or nineteen to twenty-two 
(of twenty-two) items ‘‘right’’ on handedness questionnaire. 
Rx. = Right, with left-hand tendencies—63.7 per cent to 81.8 per cent, 
or fourteen to eighteen items ‘‘right.”’ 


A = Ambidextrous—45.5 per cent to 59.1 per cent, or ten to thirteen 
items “‘right.”’ 
Le = Left, with right-hand tendencies—18.2 per cent to 40.9 per cent, or 
four to nine items ‘‘right.”’ 


L = Left—0 per cent to 13.7 per cent, or none to three items “‘right.”’ 
Eye: Letters have similar connotation as above. Degree of eyedness was 
determined in similar fashion by consistency of monocular sighting with manopto- 
scope, paper hole, and finger-object tests. 


In Table III are displayed the percentages of problem, non- 
problem, and total cases in the five eyedness categories. Other 
investigators have almost invariably employed only three eyedness 
categories for classifying subjects. To make these comparable with 
the present findings it is necessary, therefore, to combine the ‘ Rz,”’ 
with the ‘‘R,” and the “Lr” with the “L” classes. Thus, of all 
cases, sixty-three per cent are right-, thirty-three per cent are left-, 
and four per cent are ambi- or impartial-eyed. These proportions 
approximate closely those found in many reports on the incidence 
of eye dominance.* This correspondence warrants the belief that the 
combined group of children are representative of the normal population 
in their eye (as well as in their hand) preferences. 


LEFT-EYEDNESS AND READING 


The generalization just advanced may be extended both to non- 
problem and to problem cases. Examination of Table III discloses 





* Cf. summaries by Downey" and Jastak.™ 
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differences between the percentages in the eyedness classes no greater 
than those expected from the effect of errors of sampling. The largest 
discrepancy—in the L group: Thirty per cent problems as compared 
with twenty-two per cent non-problems—is reduced to a normal 
amount when L and Le groups are combined. This procedure yields 
thirty-three per cent and thirty-one per cent left-eyed in problem and 
in non-problem cases. A similarly close correspondence is present 
in the other categories. Therefore, if left-eye dominance is a source of 
difficulty in reading, as is suggested by Dearborn,’ Gates and Bennett,'* 
Monroe,”* and Stromberg, then this factor operates to an apparently 
equal extent in groups of poor and of good readers. 


MIXED DOMINANCE AND REVERSALS 


Since no relationship was found between reading ability and 
handedness as well as reading ability and eyedness, it will be interesting 
to ascertain whether any association exists between reading ability 
and various conditions of manual-ocular dominance. 

In the first column of Table IV are arrayed the twenty-five possible 
combinations of handedness and eyedness. All “R” and “R,”’ 
combinations (four) were considered “‘right”’; all ‘““L’’ and ‘‘Lr’”’ com- 
binations (four) were considered “‘left’’; all others—L and A, Rz and 
Lr, R and L, and so forth—were included in the mixed or ambivalent 
group. 

By reference to Table V, in which the data are summarized, one 
may see that of the problems forty-three per cent and of the non- 
problems forty per cent display mixed dominance; only five per cent 
of the former and four per cent of the latter are consistent in left 
ocular and manual behavior. In other words, conditions of right, left, 
and mixed manual-ocular dominance occur no more frequently among 
reading problems than among non-problems. 

Allegations concerning the nocuous effect of mixed dominance have 
frequently specified its causal relationship to reversals. For this 
reason reversal data were scrutinized. The Betts Tests of Oculomotor 
and Perception Habits, which consist of a series of ten slides, were 
employed. Used were slides twenty-two, forty, forty-one, sixty-one, 
and eighty-two, containing a total of one hundred eight easily confused 
and reversed letters, numbers, and words.* The number of reversals 
for the problems decreased steadily from nine in grade 3B to four in 
6B; the average was 6.2. For the non-problem group, the incidence 
of one remained constant throughout the grades. (Cf. Table VI.) 
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TasBLE IV.—ConpiITIONS OF HAND-EYE DOMINANCE AND THEIR RELATION TO 
REVERSAL TENDENCIES 





Reversal data 












































Conditions 
of domi- | Problem | Non- 
; problem Problem Non-problem 
nance: N% NY 
Hand-eye as 
N| Number| Average! N | Number| Average 
Right: 
R-R 44 25 43 246 5.7 | 24 43 1.8 
R-Rt 4 5 4 46 5 2 
Riri —R 3 7 2 11 7 4 
Ri — Rit 1 3 1 5 3 0 
52 56 | 50 308 6.2 | 39 49 1.3 
Mixed: 
R-A 3 2 3 24 2 0 
R —Lr 4 4 4 12 nie a 4 1 ont 
R-L 20 13 20 117 5.9 13 10 8 
R,—-—A ar os Sa 
Rr — Lr 1 - l 4 + - 
Ri -L 3 l 3 13 l 3 
A-R re 3 on 3 8 
A-R, 1 l 8 
A-A 
A —Lpr or ie ass or Bi - 
A-L 2 l 2 ll 1 l 
Lre—-R 3 1 2 2 l 1 
Lr — Ri 1 l 1 8 l 0 
Lr—-A 1 - 1 12 re a as rane 
L—-R 2 2 2 14 7.0 2 0 0 
L—Rt 1 1 10 
L-A 1 1 12 
43 40 | 42 247 5.9 | 28 24 9 
Left: 
Lr -L 3 3 23 
Lr—Lri...: e i. <s 
L —Lpr a l sta _— ie l 0 er 
L-L 2 2 2 19 8.5 2 0 0 
5 4; 5 42 8.4 3 0 .0 
Zettel ...... 100 100 | 71 100 | 97 597 6.2 | 70 73 1.0 
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There was found also a continuous decrease in total errors (includ- 
ing reversals). The problem children made, on the average, twenty- 
five errors in grade 3B, and ten in 6B. The non-problem exhibited 
an average of three errors in 3B and one in 5B. For the groups as a 


TaBLE V.—THE INCIDENCE OF HAND-EYE DOMINANCE* AND ITs RELATION TO 
REVERSALS 





Average number of 


reversals made by 
N N 


Right} Mixed | Left | Total Right} Mixed | Left | Total 


Percentages of subjects 











Problem........ 52} 43 | 5 100 |100/ 6.2} (5.9 | 8.4] 6.2] 97 
Non-problem ...} 56 40 4 100 | 71) 1.3} 0.9 | 0.0; 1.1] 70 



































* Of the twenty-five possible combinations of hanfiedness and eyedness under 
the method described in the text, the following were considered “Right”: R-R, 
R-Rxz, Rxi-R, Rzr-Rz,. The following were considered ‘‘Left’’: L-L, L-Lz, LeL, 
Lr-Lr. Included in the ‘‘mixed”’ group are all other combinations. 


TaBLE VI.—AVERAGE NUMBER OF REVERSALS BY GRADE ON Betts TESTS OF 
OcULOMOTOR AND PERCEPTION HABITS 








Grade 3B | 3A | 4B|4A| 5B | 5A] 6B | All 
NG Sik van webs ade . (8.7) .17.0).17. 5) .)5. 7). 4.2) .15.0). 13.9). 16.2 
SE ee ee ere 35]... 12). . 14). . 1/6). . ./26).. ./9). . ./ 15). . ./97 
Non-problems................ .{L.1)./2.0)./6.0).] .8). .{1.0).)1.2)..) 23). .]1.1 
er S31]. . .12).. 12)... JG]... S.. 16)... 112). . 179 





















































TaBLeE VII.—AVERAGE NuMBER OF ERRORS (INCLUDING REVERSALS) BY GRADE ON 
Betts TEsts OF OCULOMOTOR AND PERCEPTION HasiTs 
































Grade 3B | 3A | 4B | 4A | 5B | 5A} 6B All 
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whole, the average was 17.2 errors for the problems and 2.4 for the 
non-problems. (Cf. Table VII.) 

Table VI reveals “‘a decline in frequency of the (reversal) tendency 
in higher as contrasted with lower grades”’; in all grades poor readers 
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tend to make many reversals. In this respect, the data corroborate 
the work of Hildreth’* and Dearborn.’ The cause of this reading 
difficulty is not completely understood; however, several hypotheses 
have been advanced. For example, Dearborn® and others insist that 
reversals are associated with, and are caused by, mixed ocular-manual 
dominance. It will be pertinent, therefore, to examine the data for 
the problem cases and for the non-problem cases to ascertain whether 
reversal tendencies predominate in the children displaying mixed 
dominance. In Table V are shown the number of reversals for the 
dominant right, dominant left, and mixed types of children. The fact 
appears significant that reversals occur with slightly smaller frequency 
in the mixed group than in the dominant right or dominant left group. 
Furthermore, those of a mixed dominance in the non-problem group 
exhibit no greater tendency to make reversals than do those not 
displaying this irregulerity. Inspection of Table IV which contains 
a detailed account of the reversal tendencies in each of the twenty-five 
hand-eye groups, discloses the fact that in the R-L and L-R categories 
—in which the degree of mixed dominance is greatest—the average 
number of reversals is similar to that in the other groups. The data 
at this point clearly support those of Gates,'* Woody and Phillips,® 
Kirk,?* and Teegarden,‘ showing that there is little, if any, relation- 
ship between reversal errors and mixed eye-hand dominance. 


SUMMARY 


The data presented in this paper have shown a lack of relationship 
between various conditions of handedness and degree of reading 
efficiency. Eyedness was similarly found to be unrelated to reading 
proficiency. Mixed hand-eye dominance (as well as consistent 
manual-ocular behavior) was found to have no association with reading 
ability as measured by standardized tests and as gauged by the tend- 
ency to make reversals. 

Decisive as these findings are, they do not warrant discarding the 
study of laterality in the diagnostic examination of a reading dis- 
ability case. Certain conditions of laterality may be contributing 
factors in emotional difficulties related to the poor reading. For 
example, a strongly left-handed child whose manual behavior has been 
changed by unwise pressure may be working under nervous tensions 
prejudicial to effective learning. Left-eye dominance may, in some 
cases, induce definite (and readily observed) right-to-left eye move- 
ments in reading. In the kindergarten or first-grade pre-reading 
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situation the child of known sinistral tendencies should receive, 
perhaps, special supervision and training in the development of proper 
dextral reading and writing habits. These considerations suggest the 
value of employing quick, reliable methods for determining manual and 
ocular dominance. Instruments similar to those described herein 
will be found useful in this endeavor. 
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THE RELATION OF INDIVIDUAL VARIABILITY TO 
GENERAL ABILITY AS MEASURED BY MENTAL 
TESTS 


MAX HERTZMAN 
College of the City of New York 


The main purpose of this study is to find the relation between the 
variability of a number of individuals in nine mental tests and their 
general mental ability as measured by these tests. The nine tests 
can be divided into three sub-groups with respect to material and 
three sub-groups with respect to structure. Therefore, the relation 
between the measure of individual variability and the ability measured 
by each of the sub-groups was also studied, in order to find its nature 
in different types of ability. 

Thorndike presents a study in which an attempt is made to find 
the relation between general intelligence as measured by standard 
intelligence tests and individual variability. However, his measure 
of variability was not taken over a range of different tests, but vari- 
ability was studied with reference to the repetitions of a particular 
test. He finds “no evidence of any tendency for variability to increase 
with ability.”? In many cases Thorndike used but two trials for a 
given test and consequently had to employ statistical corrections to 
estimate the variability of the individuals concerned. If an individual 
received a score of K on the first trial of a given test, his variability 
would be judged equal to the average deviation of the scores on the 
second trial made by all individuals who also had received a score of K 
on the first trial. Since correlations between two trials of a given 
test or of two forms of it are considered a measure of reliability, the 
measures of variation used by Thorndike were largely a function of 
the extent to which the tests were not reliable. In addition, he was 
making an unwarranted assumption: That individuals who had the 
same score on a particular test were consequently equally variable. 
The relationship between a person’s score and his variability is a fact 
to be determined and not to be assumed. 





1 The data used here have been supplied through the kindness of Dr. Smith, 
being the same as that originally used by him in his study. 

2 Thorndike, E. L., et al.: The Measurement of Intelligence. N. Y. Teachers 
College, Columbia Univ., 1927. 
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In the present study, individual variability is actually determined 
for each individual and for a variety of different tests, not repetitions 
of the same test. 


PROCEDURE 


The Subjects—The subjects are discussed in greater detail by 
Smith.! It will suffice here to mention that they constituted a very 
homogeneous group, all being of the same sex, ninety-four per cent 
being of the same race, and most of them having similar educational 
backgrounds. They varied slightly in age, the standard deviation of 
their age being sixteen months, which is not very large when one 
considers that there were one hundred eighty-six subjects. All were 
students at the College of the City of New York, eighty per cent of 
them being either in the Upper Sophomore or Junior class. 

The Tests —The tests employed were the nine in Smith’s special 
group.” These could be arranged in three different groups, both from 
the point of view of material and of structure. With reference to 
material, a verbal battery was formed from the sentence completion, 
the verbal analogies, and the verbal generalizations tests; a numerical 
battery from the number series, numerical analogies, and the numerical 
generalizations tests; and a spatial battery from the modified Kelley 
spatial (it was modified to be suitable for college students), the spatial 
analogies, and the spatial generalizations tests. With reference to 
structure, the analogies tests and the generalizations tests constituted 
two separate groups, and the remaining three tests, a third group that 
may be tentatively called a construction group, as the main function 
called forth by each of them is the completion of some “structure,” 
i.e., a sentence, a number series, or a geometrical figure. Some of the 
tests were invented for the purpose of the original study by Dr. Smith. 
Specific illustrations of typical items to be found in them are presented 
in his monograph.’ 

From the composition of the groups, it will be seen that each 
material group will have one test in common with each structure group 
(and vice versa), but there’will be no such basis for similarity between 
any two materials or any two structural groups. 

Comparable Scores.—Since the scores obtained are from nine 
different tests, with nine different means and standard deviations, 





1Smith, G. M.: “‘Group Factors in Mental Tests Similar in Material or Struc- 
ture.” Arch. Psych., No. 156, 1933. 

2 Thid. 

Smith: Op. cit., pp. 16-17. 
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they had to be made comparable for the purpose of this study. Hull’s! 
method of conversion was used, with converted means and standard 
deviations equal to fifty and fourteen respectively. With these 
constants, a range of more than six standard deviations between zero 
and one hundred could be obtained. In our converted tests no score 
deviated as much as three standard deviations from the mean, so 
there were no negative scores and none above one hundred. 

Composite Scores.—To get a measure representing ability in each 
of the sub-groups for each individual, the converted scores for each 
subject in each group were added together. The sum of their scores 
on the nine tests represented general test ability. All the tests were 
weighted equally. This was determined to be the wisest course 
from the examination of the correlations of the tests with respective 
“principle components,” as determined by Hotelling’s method. The 
correlations of each test with the principle component of its group 
showed decided homogeneity in the different groups. The range of 
correlations in the verbal group was .12, in the numerical .04, in the 
spatial .16, in the construction .18, in the analogies .03, and in the 
generalizations .04.2. The individual correlations with the principle 
component through the nine tests also show little variations. Five 
of the correlations lay between .70 and .75, three between .60 and 
.69, and the remaining one was .56.° 

The Measure of Variability.—In the present problem, it was neces- 
sary to get a measure of each individual’s variability over the range 
of his nine scores. Variability is commonly measured by the standard 
deviation of a distribution, but the value of this particular measure 
depends on the normality of the distribution and the number of cases 
involved. With only nine cases for each individual, it was felt that 
such a measure would be unwarranted, as very few of the distributions 
were likely to be normal, and the extreme deviations from the means 


would be unduly weighted. Therefore, the interquartile range was 
chosen instead. 


THE RESULTS 


Evaluation of the Tests—As explained in the previous section, the 
original tests were combined into various groups to form six groups 





1 Hull, C. L.: ‘‘The conversion of test scores into series which shall have any 
assigned mean and degree of dispersion.” J. of Applied Psych., Vol. VI, 1922, 
pp. 298-300. 

2? Smith: Op. cit., p. 44, Table VII. 

* Ibid, p. 49, Table IX. 
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purporting to measure special factors, and a seventh measuring general 
test ability. The first six variables in Table I represent the special 
groups, the seventh, to be designated S in this study, the sum of the 
nine original tests, and the eighth designated by V, the distribution 
of the measures of individual variability. 


TaBLE I.—EvVALUATION OF THE TESTS 











Variable Mean SD Sk/esx Ku 
Peer Terr eer rrr er 150 33 . 57 1.50 .271 
i ee 150 33.45 .45 .255 
eee aaah ani 150 34.66 1.02 . 282 
Pee 150 32.76 47 . 260 
nnn k soak e eee keen 150 33 .29 2.00 . 289 
6. Generalizations.............. 150 33 . 56 1.87 . 296 
PR ci avid dd bcdnnsdecdeee 450 87.51 1.82 .272 
NN atasic aces eho 6 on Seid ck eden 16.34 6.01 —1.39 247 














The means in the first seven variables, being the sums of the means 
of the constituent tests and these (in the converted series) being all 
equal, differ consequently only according to the number of tests 
involved in the groups. Since the mean of each of the original tests 
is fifty, the means of the sub-groups are three times fifty, and the 
mean of S is nine times fifty. The standard deviations of the variables 
were obtained by means of Spearman’s formula for the standard 
deviation of sums.! The latter involves the intercorrelations of the 
variables concerned, which were obtained from Smith.? The standard 
deviations of the six special groups are fairly close to each other. 
This is clearly due to the fact that intercorrelations in each group are 
of about the same magnitude, and that the use of converted standard 
deviations in finding the standard deviations of the sums actually 
means weighting all the original standard deviations involved equally. 
V shows a fair spread as compared with the size of its mean, the 
coefficient of variation being in this case 36.7. 


The skewness was determined by the formula given by Kelley’ 


Sk = a a + P19), and the standard error of skewness by 








1Spearman, C.: ‘‘Correlation of sums and differences.” Brit. J. Psych., 
Vol. 4, 1912-1913, pp. 417-426. 

2 Op. cit., p. 24, Table II. 

* Kelley, T. L.: Statistical Method, N. Y., Macmillan, 1923, p. 77. 
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.519(P.90 — P.10) 1 





VN In all cases the ratio of skewness to the standard 


error of skewness is below three, indicating that the skewness that 
does exist is probably not significant. 


Q 


Using the formulae of as. for measuring kurtosis and 


.2778/+/N for the standard error of kurtosis, (both formulae given by 
Kelley), the kurtosis of the curves in the study were found not to differ 
significantly from the mesokurtic standard. The standard error of 
kurtosis being .20, and an ideal mesokurtic curve having a kurtosis of 
.263, it can be seen that variables one, two, three, four, seven, and 
eight differed from the ideal by less than one standard error, and the 
remaining two, by less than two standard errors. 





TaBLe II.—CoRRELATIONS OF THE SUB-GROUPS AND S WITH V 











Variable Correlation PE 
Dt ob eG casa d Gans et eaaaw eee ea — .161 .048 
EL «. 5-6 cndeyeorenes awe son debs — .236 .047 
dei ts eta ikkeus seeaee saddened — .278 .046 
Ses cea aN ene ba ew gles — .224 .047 
a on ia os ia oe alee técake es — .305 .045 
EPS ee — .189 .047 
DE bcc sake sch ean urbe nba ckee oe — .264 .046 








The Correlations with the Measure of Variability.—T able II presents 
the correlations and probable errors of the measures of individual 
variability with each of the six special groups and with S. It is 
interesting to note that all seven correlations are negative, that is, 
they indicate that there is a tendency for the existence of an inverse 
relation between the magnitude of an individual’s scores and the 
variability he manifests in a number of mental functions. It is to be 
noted, however, that the absolute values of these correlations are 
rather small, ranging from .161 to .305. Moreover, one of them (the 
correlation with the verbal group) is less than 4 PE greater than zero. 
Variable seven, which most reliably represents general test ability, 
has an absolute value of only .264 and differs from zero by only 5.74 PE. 





1 Dunlap, J. W. and Kurtz, A. K.: Handbook of Statistical Nomographs, Tables 
and Formulas. Yonkers: World Book Co., 1932. 
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It is evident that although one may conclude from these results 
that the relation between general test ability and variability is nega- 
tive, he must stress the small amount of the negative correlation. 

In examining the correlations with each of the sub-groups, while 
differences in magnitude are discovered, these are no larger than those 
one could expect to arise by chance. The difference between the 
largest and the smallest of these correlations is only .144. This 
difference is much smaller than would seem offhand, as the correlations 
involved are rather close to zero. The coefficient of alienation of the 
larger correlation is .954, and of the smaller .987, the difference being 
only .033. The ratio between the difference and probable error of 
the difference is 2.2. This indicates that the difference is possibly 
unreliable, statistically speaking; the small difference between the 
coefficients of alienation indicates that it would not be of great impor- 
tance. The ratio of the difference to the PE of the difference in any 
other pair chosen from the six special groups will be less than 2.2. 
(As the PE of the difference will be about the same or higher than 
in the instance cited and the difference will be smaller.) 

While in general it is still possible that these groups are related to 
individual variability in a differential manner. The differences in 
the magnitude of the relation cannot be of much significance. 

Significance of the Extremes.—It may be held that the negative 
correlations were obtained because no scaling procedure was used, 
and that actually the relationship is other than the one obtained here. 
If the units are not equal, it is probable that the units of the extremes 
of the distributions are greater than those in the middle. If this is so 
in the present study, the measures of variability for the individuals 
who are extremely good or extremely bad might actually be different 
from those obtained. As the measure of variability was the difference 
between two scores, a scaling process would in the case of the extreme 
individuals actually increase the magnitude of the difference as 
compared with an identical unscaled absolute difference of an average 
individual. But if we compare a group of individuals on the extreme 
negative end of a distribution with a group on the extreme positive 
end, we would find that they would be affected in about the same way 
by scaling.!. In comparing the magnitude of the measure of variability 
in the two opposite extremes, scaled and unscaled data would give 





1 This assumes that our curves are fairly normal. In this study although the 
normality of the curves has not been rigidly tested, the measures of skewness and 
kurtosis seem to indicate that the curves are normal enough for our purpose. 
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the same relative results. More concretely, the effect of scaling the 
extremes can more easily be understood when one realizes that it is 
as difficult to advance from the lowest to the next to the lowest score, 
as from the second highest to the highest. 

Consequently, the highest ten per cent of the individuals in each 
of the sub-groups and of S were compared with the corresponding 
lowest ten per cent with reference to the measure of variability. Ten 
per cent were chosen so as to represent the extremes adequately, and 
yet not to include too few cases. Since the number of the subjects 
was one hundred eighty-six, nineteen individuals could be chosen in 
each case, although in three instances tie scores required the use of 
twenty individuals, and in one of twenty-one. 

Table III presents the averages of V for each of the extremes and 
the necessary measures for finding the reliability of the difference 
between the extremes of the respective curves. It is seen that in 
four cases out of the seven (including S), the chances, interpreted 
from the ratio of the difference to the probable error of difference, are 
one hundred out of one hundred that the differences are reliable. 
In the remaining three instances the chances are over ninety-eight 
in one hundred. The variable representing the sum, which in general 
we have considered the most reliable measure, also has the highest 
ratio, that of 6.40. 

The differences between the averages are fairly large, when one 
considers that the standard deviation for the distribution of V is 6.01. 
TaBLeE III.—REwIABILITY OF THE DIFFERENCES BETWEEN THE AVERAGES (WITH 


Respect To V) OF THE UPPER AND LOWER TEN PER CENT EXTREMES OF THE 
SUB-GROUPS AND OF S 








Difference Diff 
Variables Average,* | Averagest | (average.- PE, Ch.tf 
average,) 
Ds IS ba dv ewe ee 13.6 18.4 4.8 4.36 100 
2. Numerical........ 12.7 17.7 5.0 3.47 98.5 
ee 12.1 20.0 7.9 6.22 100 
4. Constructions.....} 13.5 19.0 5.5 4.30 100 
5. Analogies......... 12.7 18.0 5.3 3.81 99 
6. Generalizations.... 14.4 19.5 5.1 3.59 99 
adil incee cud 11.0 18.1 7.1 6.40 100 




















* Average of the upper extreme. 
t Average of the lower extreme. 
t The number of chances in one hundred that the difference is reliable. 
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Table III, then, shows a fairly reliable difference with reference to 
the measure of variability, in all seven pairs of extremes. The size 
of the average in the lower extreme being greater in all cases than the 
corresponding average in the upper extreme, in view of the preceding 
discussion on the effect of scaling, it is felt that these results support 
the view that the true correlation between test ability and individual 
variability is negative. 

Again, there does not seem to be a very marked difference in the 
relations obtained here for any particular variable. All show the same 
tendencies to about the same degree. The largest difference between 
two averages in the upper extreme among the sub-groups is 2.3 
(between generalizations and spatial), and in the lower extreme 
2.3 also (between numerical and spatial). Since all the possible 
probable errors of the difference are greater than one, and the difference 
between any two pairs both members of which are taken from either 
the upper or the lower extremes is at most 2.3 (and in most cases 
much less), these differences are evidently unreliable. And all of the 
differences in the sub-groups between extremes of the same curve are 
of about equal significance (Table III). 

Rank Order Correlations within the Extremes.—While a negative 
relation was found for the distributions in general, it is possible that 
the relation within the extremes taken as a group might vary from the 
general one. For instance, it is possible that within the negative 
extremes the lower individuals would be less variable than the rela- 
tively higher ones. 

Rank order correlations were obtained between the ability mani- 
fested in extremes of the sub-groups and of Sand V. In the case of the 
V variable the scores were arranged from lowest to highest, the lowest 
being given rank one, and the others being ranked accordingly. The 
procedure was slightly different in the other variables concerned. In 
the upper extremes the highest score was given the first rank, and the 
others ranked accordingly, while in the lower extremes the procedure 
was reversed. Consequently, a positive correlation in the upper 
extremes would mean that high scores tended to go with low variability, 
and a negative correlation would lead to the opposite conclusion. 
In the lower extremes a positive correlation would lead to the opposite 
conclusion. In the lower extremes a positive correlation would mean 
that. lower (more extreme scores) tended to go with low variability 
and a negative correlation would mean the reversed relation. 
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Table IV gives the correlations found, together with the probable 
errors of some of the higher correlations. Perhaps the most interesting 
aspect of the table is the variety of correlation coefficients it presents 
as there are correlations ranging from —.43 to .52. However, because 
of the small number of cases, the probable errors are very large and 
all but one correlation is less than 4 PE greater than zero. In the 
upper extremes of the seven variables, all but one of the correlations 
(that with variable three) are positive. Without regarding the 
quantitative values too seriously, we can nevertheless interpret this 
as indicating that the general rule also holds in the upper extremes. 
In the lower extremes two of the correlations are negative, (one having 
a fairly large magnitude), and five are positive. This, superficially 
at any rate, would indicate that the general relation is reversed to 
some extent here. However it should be noted that all but one of 
these positive correlations are very low. 


TaBLE IV.—RaANK ORDER CORRELATIONS BETWEEN ABILITY AND V IN THE TEN 
Per Cent Upper aND LOWER EXTREMES OF THE SUB-GROUPS AND S 











Upper extreme Lower extreme 

Correlation PE* Correlation PE* 
es ee .35 .14 .22 
2. Numerical.......... 000; .19 aig — .43 .14 
0 rrr err —.11 cos .08 
4. Constructions............ 18 es .13 
I 6c cacneeecees .27 ec —.18 
6. Generalizations.......... .14 — . 52 .12 
eh BA aise e wie ay Ba .40 .14 .10 

















* The PE of the higher correlations only were calculated. 


In general these correlations can not show anything reliably in a 
quantitative way. The small spread in scores for V in each of the 
extremes gives, in many cases, undue weight in rank to a value that 
does not differ very much in absolute value from another score very 
different in rank. Scaling, moreover, would tend to increase the 
magnitude of the V scores of the lower individuals in the lower 
extremes, thus making possibly for negative correlations which could 
fit the general scheme. In the upper extremes such scaling would now 
decrease the positive correlations. 
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The variability displayed by the correlation coefficients themselves 
indicates that the small sampling is inadequate to give very reliable 
results in this phase of the study. The large positive correlation in 
the lower extremes of variable six, can hardly be explained as anything 
but an extreme chance deviation. 


SUMMARY AND CONCLUSIONS 


1. Correlations were obtained between the sum of nine mental 
tests, measuring general test ability, and a measure of individual 
variability, and also between six sub-batteries and the same measure 
of individual variability. These were found to have low negative 
values indicating a slight tendency for high scores in each of the seven 
groups to go with low scores in variability. 

2. No significant difference between the correlations of the sub- 
groups with the measure of variability were found. 

3. A study of the averages of the measure of individual variability 
in the ten per cent upper and lower extremes indicated that the 
differences between the averages of the extremes of the different 
variables were reliable. This reinforced the conclusion stated above, 
and helped to demonstrate that the negative correlations were not 
due to the lack of scaling procedure. 

4. The study, then, showed that extremely high individuals tended 
to be more consistently high in test to test of a series, than those with 
extremely low scores tended to be consistently low. 

5. Rank order correlations in the extremes of the distributions 
between test scores and the measure of variability were obtained, but 
these had to be interpreted with great caution, because of the errors 
involved in the small samplings. Some of them tended to show that 
the same relation that existed between individual variability and the 
entire distribution of general ability, also existed in these smaller 
groups. In the lower extreme groups there was a slight tendency jor 
the more general relationship to be reversed. 
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A NOTE ON CORRELATION BY RANKS 


A. C. ROSANDER 
Research Fellow, General Education Board, Bronxville, N. Y. 


A problem of importance in connection with rank order correlation 
is the effect of deviation from the true rank order upon the magnitude 
of the correlation coefficient. In connection with the use of the method 
of rank order in psychological measurement, the writer analyzed this 
problem and obtained some expressions which have proved useful. 

We make two assumptions, the first being that the true rank order 
is known or can be found, and second that the deviations of any 
individual ranking from this true rank order will spread themselves 
more or less evenly throughout the entire range of ranks. The more 
the data deviate from this assumption, the less accurate will be the 
expressions which we will now proceed to derive. We shall let epsilon 
(e) stand for the mean deviation from the true rank order, and use the 
Spearman form of rank order correlation. We shall first express the 
rank order correlation coefficient in terms of this mean deviation from 
the true rank order (e). 





























If « equals 0 then p = 1 — wae i) = i-—0O 
If « equals 1 then p = 1 — eT =1- — i 
If ¢« equals 2 then p = 1 — weeat =1- wo 
If « equals 3 then p = 1 — ae =l1- oe 
If « equals m then p = 1 — arn =l1- _——_ 
If we use ¢ instead of m in our general formula it becomes 
p=1- _ i (1) 


RELATION BETWEEN p AND € FOR A CORRELATION OF ZERO 


The condition for the rank order correlation to be zero is for the 
last term of the general formula just given to be equal to unity. 
Therefore setting this term equal to unity and solving for e we obtain 
the following expression 
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7 [n® = 1 (2) 





Computation of ¢ for various values of n showed that there was a very 
close constant ratio between the two. Hence we solved for the ratio of 
n/e which gave us the following equation: 


n 1 
ri \ oT a ®) 
Now if ¢ is relatively large, then the second term in the right hand side 


of the equation (3) can be ignored so that we obtain the following 
simplified form: . 





~ = V6 = 2.45 (4) 
We give several different values of n and the corresponding values for 
the mean deviation in rank order so that the condition for zero correla- 
tion will be met. Notice that in every case the mean deviation is 
approximately forty-one per cent of n. 


TaBLE I.—MEAN DEVIATION FROM TRUE RANK ORDER FOR ZERO CORRELATION 
FOR n RANKS 


2.0 5 
4.0 10 
6.1 15 
8.2 20 
10.2 25 
12.2 30 
16.3 40 
20.4 50 


PERCENTAGE DEVIATION FROM PERFECT CORRELATION 


The second term of equation (1) really shows to what extent the 
individual ranking deviates from a perfect correlation since the smaller 
this term the higher the correlation, whereas the larger this term the 
lower the correlation. Therefore we can write at once an expression 
for the actual deviation of the ranking in question from a perfect 
ranking; if we multiply by one hundred we will obtain our results 
directly in per cent: 
600e? 


ee 


(3) 


where pg stands for the per cent deviation from perfect correlation. 
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SPECIAL FORMULAE WHEN 7 IS TWENTY-FIVE 


The writer has been working with rankings of twenty-five elements, 
and in connection with an extended study has developed some formulae 
which may be used either for primary computation, for checking 
results obtained by other methods, or for obtaining a rough estimate. 
The rank order product formula for obtaining rank order correlations 
on the basis of products of paired ranks can be re-written in the follow- 
ing form, an equation which is linear in 2XY and pos. 


ZXY = 1300p25 + 4225 (6) 


where XY is the product of paired ranks. This can also be written so 
that the computation of pos is facilitated: 


SXY — 4225 
awa 1300 (7) 





Another equation which has been found very useful is the one which 
shows the relation between p and e« when n is twenty-five. The error 
involved in using one hundred in the last term is very slight for most 
computations. This leads to an equation in which the deviation from 
a true correlation is proportional to the mean deviation squared. 


e2 


ps =~ 1-H (8) 


Table II shows what correlation coefficients we should expect 
when the mean deviation from the true rank order is that which is 


shown in the column designated ‘“‘e.”” These data are based upon 
equation (8). 


TaBLe II.—Errect or MEAN DEVIATION FROM TRUE RANK ORDER UPON 


CORRELATION FOR TWENTY-FIVE RANKS 
€ pis 

0.0 1.00 
.99 
.96 
91 
84 
.75 
.64 


cooooo oe 


Due to the nature of this equation it gives only approximate values 
when « is increased to ten. Actually a value slightly greater than ten 
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is necessary in order to give a zero correlation; in Table I the correct 
value is 10.2. 

In the ranking of statements and situations which the author 
employs in the construction of social attitude scales, most sorters 
selected from among graduate students in psychology and educa- 
tion have a mean deviation of from two to threeranks. The best raters 
have an average deviation of one rank or less while the poorer sorters 
have a mean deviation of four or five ranks. In general the curve of 
ranking is skewed since the usual rater has a mean deviation of about 
two. To date the writer has found no one among the sorters used who 
had a mean deviation as high as six ranks in a total of twenty-five. In 
this type of rating, the average rating of a large number of selected 
judges, from fifty to one hundred, is taken as the true rank order of 
the statement. 

It might be well to add that the analysis which we have given might 
be applied to any two sets of rank orders even though neither may be 
accepted as the true rank order. If my rankings of twenty-five 
students on a given trait differ by ten ranks from your rankings of 
these same students, then by the data in Table I, the correlation 
between our rankings will be approximately zero even though neither 
of our rankings can be considered a true ranking. The main assump- 
tions that must be met, if this is to be true, are those mentioned at the 
beginning: That the rater is just as likely to make an error in ranking 
at one point of the series as at another point, and that these errors are 
all of the same magnitude. While these limitations may seem 
stringent, they do not destroy the essential value of these equations 
in practical work. 
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STATISTICAL ABILITY NECESSARY TO READ 
EDUCATIONAL JOURNALS 


JOHN W. DICKEY 
State Normal School, Newark, N. J. 


One important objective of A First Course in Tests and Measure- 
ments is to develop the ability of the student to read the professional 
journals. The instructor has the problem of maintaining the proper 
balance between stressing statistics to such a degree that he is assum- 
ing the burden of A First Course in Statistics, and of neglecting statistics 
to the extent that the student will be unable to read the educational 
and psychological literature in the journals. A survey of the technical 
language used would, therefore, aid in the solution of this problem of 
maintaining this proper balance. 

The aim of this paper is to report the findings of a survey of a 
number of professional journals to determine the number and kind 
of statistical measures used.' Studies related to this problem have 
been made. Studies of the trends? in psychology and education 
reveal an increase in the use of statistical measures. Recent vocabu- 
lary studies* when compared with an earlier study‘ indicate similar 
trends. A study® of difficulties encountered by students in statistics, 
and two recent studies® attempting to standardize the symbolism of 





1 The writer wishes to express his appreciation to Miss Lillian Uslan, a recent 
graduate of the New Jersey State Normal School at Newark, New Jersey, for her 
valuable assistance in conducting this survey. Grateful appreciation is also 
given to W. V. Singer, Head of the Department of Education of the State Normal 
School at Newark. 


2 Goodenough, Florence L.: ‘‘Trends in Modern Psychology.” Psychological 
Bulletin, Vol. XXXI, No. 2, Feb., 1934, pp. 81-98. 


Maller, J. B.: ‘‘Forty Years of Psychology.’’ Psychological Bulletin, Vol. 
XXXI, No. 8, Oct., 1934, pp. 533-559. 

3 Jensen, M. B.: ‘‘Relative Values of Vocabulary Terms in General Psy- 
chology.”” Psychological Review, Vol. XL, March, 1933, p. 96. 

Warren, H. C.: Dictionary of Psychology. New York: Houghton Mifflin 
Company, 1934. 

4 Odell, C. W.: A Glossary of 300 Terms Used in Educational Measurement and 
Research, University of Illinois Bulletin, No. 40, Urbana, IIl., 1920. 

5 Brown, Ralph: Mathematical Difficulties of Students of Educational Statistics. 
New York: Teachers College Contribution to Education, No. 569, Bureau of 
Publications, Teachers College, 1933. 

6 Monroe, Walter S.: ‘‘Standardization of Statistical Symbolization.”” Jour- 

149 





150 The Journal of Educational Psychology 


statistics, to say nothing of the numerous textbooks entirely devoted 
to statistics, also indicate the increasing importance of these tools in 
education and psychology. In fact statistics is the language of 
experimentation in the social sciences. 

The method followed in this study was first that of obtaining from 
the heads of the departments of the New Jersey State Normal School 
at Newark the choices of professional journals available for teachers 
in the elementary schools. These department heads suggested a 
total of some thirty journals, and in terms of frequency of suggestion 
and the writer’s judgment seven journals were chosen as follows: 
Educational Administration and Supervision, The Elementary School 
Journal, The Journal of Educational Psychology, The Journal of 
Educational Research, The Journal of Educational Sociology, Progressive 
Education, and The School Review. 

The issues of the journals selected for the survey together with 
the number of articles surveyed in each journal and the number of 
these using statistical measures, are shown in Table I. 

Five issues of each'of the seven journals were selected as a sample 
of each of the four journal-years (1925-1926; 1928-1929; 1931-1932 
and 1934-1935). A total of eight hundred ninety-eight articles 
were surveyed, and three hundred forty-two, or 38.1 per cent, of these 
use statistical measures. The Journal of Educational Psychology, 
had the greatest percentage of articles using statistics (54.4 per cent); 
Progressive Education, the least percentage using statistics (0 per cent). 
The averages (37.4 per cent, 39.1 per cent, 45.0 per cent, 32.4 per cent) 
for the four journal-years chosen would indicate a gradual increase 
in the number of articles using statistical measures, save the average 
for the 1934-1935 year which may have a spurious decrement. In 
general it is safe to conclude that about four out of ten articles in the 
journals chosen require some knowledge of statistics in order to be 
read with a fair degree of understanding. 

The nature of the statistical measures used in these journals 
together with the frequency of each measure is shown in Table II. 
A measure is given a frequency of one (1) when it is found one or more 
times in a given article. If, for example, the mean (M) occurs five 
times in one article, the understanding of this is no more difficult as 





nal of Experimental Education, Vol. I, March, 1933, pp. 223-228. 
Yntema, T. O.: ‘Some Comments on Materials for Teaching Statistics, with 


Special Reference to the Use of Symbols.’”’ Proceedings of the American Statistical 
Association, Vol. XXVIII, March, 1933, pp. 15-19. 
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far as the statistics is concerned, than if it occurred only once. What 
are the measures (including concepts) which should be taught if 
TaBLE I.—SHOWING THE IssUES OF THE JOURNALS SURVEYED TOGETHER WITH 


THE NuMBER OF ARTICLES IN Eacu JOURNAL, THE NUMBER OF ARTICLES 
Usine Sratistics, AND THE ToTALs BotH IN ABSOLUTE- AND IN 

















PERCENTAGE-NUMBERS 
Educ. 
Ad- Ele. Jour. a ah poh School 
Issues used min. |School| Educ. 4 oe _ a Re- Total Per cent 
Re- | Soci- | Educa- . 
and | Jour.| Psy. ; view 
search| ology | tion 
Supv. 

September, 1925...... 6-1! 8-3 9-6 5-1 3 8-02 5-1 41-12 |29.3 
November, 1925...... 6-1 7-1 6-5 5-1 3 7-02 7-2 | 38-10 |26.3 
January, 1926........ 8-3 6-1 7-7 6-4 3 4-0? 6-4 35-17 |49.5) 37.44 
March, 1926......... 8-2 7-3 4-4 6-4 3 4-0? 6-4 35-17 |48.5 
fe 9-3 6-1 7-7 6-3 4 11-02 7-3 46-17 |37.0 
September, 1928...... 8-2 6-3 9-4 7-7 5-1 3 8-2 43-19 |44.2 
November, 1928...... 7-3 7-5 8-6 8-3 4-0 a 6-3 40-19 |47.5 
January, 1929........ 8-3 7-1 7-5 6-3 5-1 6-02 7-2 46-15 |32.6) 39.14 
March, 1929......... 7-4 5-1 9-6 6-5 5-0 6-02 6-2 44-18 |40.9 

i 10-4 7-3 5-4 6-5 4-0 10-0? 5-0 | 52-16 |30.2 
September, 1931...... 7-4 7-2 11-8 5-3 5-0 » 6-4 41-21 |51.2 
November, 1931...... 8-1 6-1 8-6 9-7 7-0 6-0 6-4 50-19 |38.0 
January, 1932........ (a) 6-2 8-8 6-5 6-3 6-0 6-2 38-20 |52.6) 45.04 
March, 1032......... 8-2 7-2 9-8 5-2 7-3 5-0 6-4 47-21 |44.7 

BEGG, TBGB. .ccccccces 12-0 6-3 8-8 8-6 7-0 10-0 6-5 57-22 |38.6 
September, 1934...... 8-1 6-3 7-6 7-2 7-3 3 6-0 | 41-15 |36.6 
November, 1934...... 10-3 5-4 8-6 5-2 7-3 11-0 5-3 51-21 |41.2 
January, 1935........ 8-4 6-2 8-8 4-3 5-0 12-0 5-3 48-20 |41.7| 32.44 
March, 1935......... 8-2 6-3 11-9 11-0 7-0 14-0 6-3 63-17 |27.0 

May, 1935........... 3 5-1 3 4-2 9-0 15-0 6-3 39-6 15.4 

ERR ey epee 146-42 | 126-45 | 149-121/125-68; 90-14; 141-0 |121-52/)898-342 

Ee 28.8 | 35.7 81.2 54.4) 15.6 0.0 43.0 38.1 



































1 This table is read as follows: The September, 1925, issue of the professional journal, Educa- 
tional Administration and Supervision, provided six (6) articles for our survey, and one (1) of these 
six articles used statistical measures of the type listed in Table II]. The ‘‘ Total” shows that of 
the forty-one articles surveyed for September, 1925, twelve, or 29.3 per cent, of them used these 
measures. 

2 These issues are quarterly, and in this case the entire three months were surveyed and listed 
as of one month. 

3 In these cases the journals were not available. 

« These percentages are the averages for the journal-years. 


arbitrarily we use.a frequency of ten (10) as the dividing line?! These 
measures are: Mean, median, average, standard deviation, range, 





1 Tables and Graphs are not included in this study because it is believed that 
their widespread use requires that they be taught if not already understood. 
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TaBLeE [J.—StatisTticAL MEASURES WITH THEIR FREQUENCIES, AS FOUND IN THE 


PROFESSIONAL JOURNALS SURVEYED! 











Journals 
Educ. 
Measures* Ad- | Ele. | Jour. Jour. | Jour. School woe 
min. |School| Educ. Edue. Educ. Re- 
Re- | Soci- : 
and | Jour. | Psy. view 
Sunes. search| ology 
Central tendency 
ESI Vis a is incentives eee eceli teal 13 7 67 45 7 14 153 
ERASER es ee ee eae Teen e re 16 12 31 19 2 21 101 
AE I SS eee ae a oe 5 20 9 5 1 23 63 
eh Che wins han kad eae 1 0 0 1 0 1 3 
Variability 
EEE IEE PEEP ET PTT 4 5 53 29 1 7 99 
i a ia ital a onal ie Sie 6 8 20 13 2 3 52 
ET PO ES 4 3 24 13 0 0 44 
Coefficient of variation................... 2 0 6 2 0 1 11 
ee ae emake awed 1 1 3 2 0 3 10 
Average deviation........... edustkobauiel 0 0 1 2 1 2 6 
PD, ccccanececcaseceaceee 0 1 1 0 0 2 4 
Correlation 
i ct hleee Cees eo sawabihwwesan 10 12 82 26 2 16 148 
es cctiettcbarebasessaacnwed 1 0 2 2 0 0 5 
Spearman-Brown formula................ 0 0 4 0 0 0 + 
ss pte ehiniaviek«cnar éeee 0 0 1 0 3 0 4 
Pht stdvetckcheeeheneenewneas 0 0 0 1 1 0 2 
Validity coefficient... .....cccccccccccecs 0 0 1 0 0 0 1 
EOS re ee 0 0 1 0 0 0 1 
Contingency coefficient................... 0 0 0 0 1 0 1 
Reliability 
Dds siopravidesbaves énebeoeta 2 9 28 18 1 8 66 
eas ee ee eed 0 12 8 0 0 2 22 
Reliability coefficient.................... 0 1 12 2 0 1 16 
EEN ES ne ee ee ree 0 1 9 2 0 2 14 
ad aii a a i beled oak ciel 1 0 9 4 0 0 14 
nds rc tke ed aadwsa hae a ewk a 1 0 4 8 0 1 14 
a ie aa hl a a nianina aye wales 1 1 5 3 1 2 13 
PEMean Co ccerorsereeseceeesresececesece 2 1 6 1 1 2 13 
Di rinshociesaekea anes baehedwseen 0 2 5 3 0 0 10 
EEE EE Te Ee Te 1 0 3 1 0 1 6 
Experimental coefficient.................. 0 0 3 1 0 1 5 
Chances in one hundred.................. 1 0 4 0 0 0 5 
Standard error of estimate................ 0 0 2 0 0 0 2 
Standard error of regression coefficient..... 0 0 2 0 0 0 2 
| RIS TS, ei ape ee ee 0 0 1 0 0 0 1 
Chances in one thousand......:.......... 0 0 0 0 0 1 1 
Chances in ten thousand................. 0 0 1 0 0 0 1 
Hirror Of predictiom.....ccccccicssccccess 0 0 1 0 0 0 l 
Difference is significant (chances).......... 0 0 0 1 0 0 1 
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probable error, coefficient of variation (V), Pearson r, difference, 
standard deviation of a difference, reliability coefficient, standard 


TABLE II.—Continued 











Journals 
Measures? Edue. Jour. | Jour Total® 
Ad- | Ele. | Jour. Ed - Ed * School 
min. |School | Educ. _ uc.) Re 
Re- | Soci- : 
and | Jour. | Psy. search| cleay view 
Supvr. 
. 
Miscellaneous 

srossube sda donteccbosneneanece 7 2 13 16 2 10 50 
PE co nntonccecacsdeveseeeeeduens 5 6 8 10 1 2 32 
Dh. <i op tenbabbanedeeseadentaeneees 5 2 0 2 1 6 16 
Achievement quotient (AQ).............. 0 0 4 3 2 0 y 
ER ceunithenre habe onsiees seen 0 0 6 1 0 0 7 
Educational quotient (EQ)............... l 0 3 2 1 0 7 
i krnvasceacabsbeeneesseens 0 1 3 2 0 0 6 
ih ccnebena6e6ee tb buweneseneaws 0 0 4 1 0 0 5 
ft. scsededbaneaveuebivensadind 4 0 0 0 0 0 4 
Pe: -canmisseededeeteceeunre seca 0 3 1 0 0 0 4 
Educational age (EA)..............000e5- 1 0 1 1 0 0 3 
Dh ch dvccbeiebvisteoresdkbeaewees 0 2 0 0 0 0 2 
<6 ce vdcesneousadiedesceeunens 0 0 1 1 0 1 3 
De Kicncdiarereensesenssasves eres 1 0 1 0 1 0 3 
a. + diene deeeee obs eeees 1 0 1 0 0 0 2 
i rcink60ss6enensesesssaberete 0 0 1 1 0 0 2 
Din ape deesensd ben hesseawencane 0 1 0 1 0 0 2 
rr ee 0 0 0 0 0 1 1 
Regression equation.................6.4-. 0 0 1 0 0 0 1 
iid dé ocsonsencaweenenne 0 0 0 1 0 0 1 
i ve kee deh esbabka Codeeseeeneue 0 0 1 0 0 0 1 
i ih cccebaveetaaendoesesecees 0 0 1 0 0 0 1 
Intelligence quotient (IQ)................ 0 0 1 0 0 0 1 
Cumulative frequemey. .....ccccccccccees 0 0 1 0 0 0 1 
PE ckseccnedededdscacvaceeacsvees 0 0 0 1 0 0 1 
Deviation from average.................. 0 1 0 0 0 0 1 
SE inicacepbetnsese steedsede 0 0 1 0 0 0 1 


























1 The journal Progressive Education was not included in this table because, as was shown in 
Table I, it contained no articles using our measures. 


2 The word ‘‘ Meagures” is understood to include statistical measures and technical terms such as 
Reliability, or Weighted, etc. 
® These totals are to be thought of in terms of the three hundred forty-two articles which used 


these measures—shown in Table I. If a measure is found once or n times in a given article, it is 
recorded only once. 


deviation of the mean, difference divided by the standard deviation 
of the difference, reliability, critical ratio, probable error of the mean, 
the probable error of a difference, quartiles, frequency, and rank. 
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In the grouping of the primary data in Table II, the writer was 
impressed with the rather widespread use of synonyms. This he 
believes to be the product of a young science—experimental social 
science. Synonyms such as, mean and average, standard deviation 
and standard error (strictly speaking this is incorrect), quartile 
deviation and semi-interquartile range, reliability and consistency, 
etc., to say nothing of the variation in statistical symbolization, were 
persistently reappearing. 

The writer should like to suggest a number of additions to and a 
few subtractions from the above list which was culled from Table II. 
The additions are: The use of measure of central tendency as a general 
term to include the mean, the median, etc.; Spearman’s Rho, because 
in many cases in education and psychology it is a question whether 
we have advanced any further than the ranking stage and therefore 
may have overworked the more refined correlation techniques; 
the coefficient of alienation (k), to be used to give meaning to the 
Pearson r; IQ; MA; CA; percentiles; cumulative frequency; norms 
(age and grade); Normalcy (G@); chances in one hundred that a dif- 
ference is significant; scattergram; and a difference divided by the 
probable error of the difference. The subtractions are: The coefficient 
of variation (V); and the critical ratio, because it is only a special 
case of the more general concept, the chances in so many that the 
difference is significant. These additions and subtractions it must 
be remembered are based upon the writer’s judgment. 

In conclusion, a survey of seven journals selected from a list of 
journals adjudged by the heads of departments of the New Jersey 
State Normal School at Newark to be of most value to teachers in 
the elementary schools, reveals that about four out of ten articles 
require some knowledge of statistics in order to be read intelligently. 
This study also indicates the statistical measures found, together with 
the writer’s recommendations for certain additions and subtractions. 




















WORDS MOST FREQUENTLY USED BY A 
FIVE-YEAR-OLD GIRL 


RICHARD STEPHEN UHRBROCK 


Wyoming, Ohio 


The daily experiences of a girl were dictated into an Ediphone 
during the six weeks preceding her fifth birthday. A sample of 
twenty-four thousand words was obtained. The child used eighteen 
hundred eighty-five different common words with varying frequencies, 
and five hundred twenty-six different proper nouns. The entire list 
of different common words used has been reported.’ 

The subject, Margaret Ann, never used fewer than two hundred 
fifty-eight different words in any sample of one thousand words 
dictated, nor more than three hundred thirty-one words. In twenty- 
four samples of one thousand words each, the average number of 
different words used, per thousand, was two hundred ninety. Fifty- 
two new words, not previously used, appeared in the twenty-fourth 
thousand. 

Table I reveals a number of interesting and significant relationships. 
First, eighteen hundred eighty-five different common words were used 
with varying frequencies. There were seven hundred fifty different 
words used only one time each. These accounted for 39.79 per cent 
of the different words that were used. There were one hundred forty- 
one different words that were used twenty or more times. These 
accounted for 7.48 per cent of the different words that were used. 

A total of twenty-three thousand twenty-five common words was 
used by the child while dictating twenty-four thousand words. (The 
difference is accounted for by the five hundred twenty-six proper nouns 
which were distributed, in varying frequencies, throughout the data. 
These proper nouns were not tabulated.) Column 4 should be read as 
follows. In the total sample of twenty-three thousand twenty-five 
words dictated there were seven hundred fifty words, each of which 
occurred once. . There were six hundred fifty-eight words, made up of 
three hundred twenty-nine different words each of which recurred 
twice. There were five hundred thirty-seven words, made up of one 


hundred seventy-nine different words each of which recurred three 
times, and so on. 





1 Uhrbrock, Richard Stephen: ‘‘The Vocabulary of a Five-Year-Old.” Edu- 
cational Research Bulletin, Vol. XIV, April 17, 1935, pp. 85-97. 
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It is particularly revealing to observe that three-quarters of the 
dictated material involved the use of only one hundred forty-one 
different common words, each of which was used twenty or more times. 
The complete list of the one hundred forty-one most frequently used 
words in this sample of twenty-four thousand words of dictated 
material appears in Table IT. 


TaBLE I.—ANALYsIS OF VOCABULARY OF A FIVE-YEAR-OLD GIRL 





























Eighteen hundred eighty-five | Twenty-three thousand twenty- 
words used one or more times five words dictated 
Frequency : 
Different P a P 
eussientih einai er cent words (column er cent 
1 X column 2) 
(1) (2) (3) (4) (5) 
1 750 39.79 750 3.26 
2 329 17.45 658 2.86 
3 179 : 9.50 537 2.33 
4 131 6.95 524 2.27 
5 75 3.98 375 1.63 
6 49 2.60 294 1.28 
7 40 2.12 280 1.22 
8 29 1.54 232 1.01 
9 20 1.06 180 0.78 
10 21 1.11 210 0.91 
11 23 1.22 253 1.10 
12 14 0.74 168 0.73 
13 17 0.90 221 0.96 
14 8) 0.48 126 0.55 
15 15 0.80 225 0.98 
16 13 0.69 208 0.90 
17 4 0.21 68 0.29 
18 11 0.58 198 0.86 
19 15 0.80 285 1.24 
20 plus..... 141 7.48 17 ,233 74.84 
Total...... 1,885 100.00 23 ,025 100.00 
SUMMARY 


1. A girl dictated twenty-four thousand words into an Ediphone 
during the six weeks preceding her fifth birthday. 

2. The number of different words per one-thousand words sampled 
ranged from two hundred fifty-eight to three hundred thirty-one. 
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TaBLE II].—FREQUENCIES OF ONE HUNDRED ForTY-ONE Worps APPEARING 
TwENTY OR More TIMES IN A TWENTY-FOUR THOUSAND-WORD SAMPLE 
DICTATED BY A FIVE-YEAR-OLD GIRL 








Word Frequency Word Frequency Word Frequency 
(1) (2) (1) (2) (1) (2) 
Dicwapentetodenkaw 523 grapefruit......... 30 tvepinedebens 96 
Se scereveesaee 41 0 26 er 28 
— 79 ee re 317 SED cccccesee 21 
afternoon.......... 24 Dieacedecessés tnw'n 26 Pcanscenceseeen 182 
ree 77 Micnd ceenee ees 20 A 32 
errr T Te 2681 naa cipe sewed 4 57 i cpetseensuwe 211 
cn eeedeas 20 eee ain ec oui 222 ee 112 
pr 30 inen@eben eens 148 i teeeeseseeoate 76 
i itddedceaees 51 i scaswhenetes 33 ere 238 
ese ehewaseen 24 iain ede tity un iil 65 ee 85 
ES Pe 21 Di Neeweseeneeess 56 Ps panes wean 145 
Cdesshesboeconens 80 ee 76 re 22 
 Micnccsesudca 26 i cnvexeewecae 30 ccs crancesaa 43 
back (adv)........ 92 eh enneheoasews 564 Di éeeeseecens 21 
i icedvecswedee 21 idhadtheaspenaeice 44 cechedeneeesed 29 
a er 36 ree Ty ee 29 beneéeencnnens 236 
Ae 24 Datiendsecedenss 489 Dctnseascowseds 1330 
i itccnensecsese< 22 0 ee 20 Pbnéedces cesses 88 
Piistenevenensned 29 Dietkeresnednoneses 51 Siiadedsceedeaee 312 
breakfast.......... 45 ii ctanapesbketneee 426 re 65 
Pnscecccesed 25 Debenavecees és 73 re 21 
er 60 Ditectineonscie 20 er 114 
ee eer 24 Piieigsesenesseue 23 ee 23 
 chccceseceewa 135 hse6seéecasuden 27 ee 119 
npadceeonseuewu 32 ee 116 ee 20 
ee 33 i iné¢eseseeden 22 i ccieeesneduae 542 
Sdvacsascedens 33 Ph +¢seaeeeee 24 Pbcccccccccces 70 
Sivevaccccecss 21 Pitirkhdenee eed 52 a 59 
i Cisccaneoeen 33 a 102 iiiseceseeeesesen 42 
Pi iticéiraeees 50 i vdcéh pote eaas 57 Pssececeessecee 87 
Ea 196 i icveseedeas 37 ee 40 
dictaphone........ 38 — &4 Dtacucemedeeweds 236 
Pi e.scceeesuen 63 Peccctceceses 225 Diteeseeneceseeee 35 
BS ckdcccecons 36 | 165 PE cccoscenecs 38 
este nbeenbinee 22 ee 4 iitiereecessennes 342 
Di ttavesendend 23 Dikcrréedseedes 35 SE cccccccecees 33 
pe 139 tisha paceds 22 eccaeemennend 793 
Di pticevenuewes 28 re 25 i ccenacecweed 210 
Ge cecccccccocss 21 Picceseecceceeos 46 WM scccccsccsoes 102 
ee 21 DP ictcibetneguces 203 Divcseveeeseee 126 
ithichencecened 157 btnesesee+eeseee 71 i iicckesenen he 32 
Ee 29 ivieteéendecees 77 Gh wsseecceress 20 
ee 56 Disdsscosccccesess 295 Piiveccteceneus 61 
Pinbekenneeseees 62 er 72 Dicceccceses 23 
Pisesdsccsoccesveos 54 ee 22 Wik sbeccccececes 95 
0 eee 83 Pibeetsnsccseeews 32 WOMENS. cc ccccces 22 
Biesccscecvccsenss 199 Bis csesccscncoces 189 WE cccccocesesose 44 
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The average number of different words per thousand was two hundred 
ninety. 

3. There were eighteen hundred eighty-five different common 
words and five hundred twenty-six different proper nouns used with 
varying frequencies. 

4. Three-quarters of the dictated material involved the use of only 
one hundred forty-one different common words, each of which recurred 
twenty or more times. 
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CLASSES, LARGE OR SMALL? A REJOINDER 


AUSTIN B. WOOD 
Brooklyn College, Brooklyn, N. Y. 


From a study reported in this Journal, Hartmann! concludes that 
“the alleged benefits of greater student-teacher contacts are either 
fictitious or effective in some other way than in the overt manifesta- 
tions of scholarship.”” On the face of it, this study would seem to offer 
scientific support to the thesis that student groups (classes) may be 
increased in size indefinitely without reduction in the effectiveness of 
teaching. Because a conclusion of this sort may well come to serve as 
justification for a policy of having larger and larger classes, with the 
result that they may not in fact allow as effective teaching as is possible 
in the smaller class, it is thought desirable to indicate a few criticisms 
of the study which may tend to limit the conclusions derived from it 
much more than was done by the author. 

In the first place, although the student group was large, there was 
but one instructor involved in the entire experiment. Had there 
been others, or even one other, the conclusions might have been quite 
different, one way or the other. 

Secondly, there was but one type of “‘student-teacher contact,’ 
and that, to judge from the author’s description, a highly formalized 
and infrequent one. Furthermore, the single hour per term which 
the members of group B had in private conference was no less than the 
time that the more aggressive or interested students will obtain after 
class and in and out of office hours no matter what the size or method 
of handling the class. 

Thirdly, in formulating the problem, the author has overlooked 
completely the possibility that the important thing here may not be 
some individual relationship between one student and one teacher, 
but a form of group relationship which characterizes small and intimate 
groups (7.e., a teacher and a group of a few students) and does not 
characterize either a group of two (one teacher, one student) or a 
large group (one teacher, many students). Anyone who believes in the 
pedagogical value of discussion among students will see the importance 
of this point. It may very well be that the “light rein” by which a 





1 Hartmann, G. W.: ‘‘Comparative gains under individual conference and 
classroom instruction.’? Journal of Educational Psychology, Vol. X XVI, 1935, 
pp. 367-372. 
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teacher can guide the efforts of the students themselves to think out 
loud in a small class, is a type of ‘student-teacher contact” which, 
if present, would have completely reversed the conclusions of this 
study. Such discussion is obviously impossible in the larger classes. 

Finally, it seems as though the author has rather “stacked the 
deck” in favor of the A group in that he meets them, albeit en masse, 
some forty hours, whereas the members of the B group are met only 
about twenty-five hours (twenty-four in class, one in conference) in the 
same period. 

It would seem to the writer of this note that the only conclusion 
justifiable by the data of this experiment is that one certain instructor 
taught one group of students more effectively in forty hours of class 
work than he did another group in twenty-four hours of class work 
and one hour per student of private individual conference of the 
nature described. 





